vodarus vodarus wrote:
    Hi,

    By using Øysteins approach I was able to get the time down to 2.4
    seconds on my machine, on which the client [1] and stored procedure
    code took around 12 seconds. The best I could get on the latter,
    tweaking page cache size and page size, was around 8 seconds.

    By cheating and removing some durability guarantees, I got down to a
    best time (not quite stable) of 1.5 seconds using Øysteins suggestion.

    I was surprised of the high disk activity seen when running the
    code. Lots of writes are taking place, which I did not quite expect
    for Øysteins query. But I do not know the implementation or the
    algorithm being used.

    There also seem to be some overhead invoking a stored procedure, as
    the client [1] code is faster. This would of course look different
    if the network JDBC driver was used, as you wouldn't have to
    transfer the data over the wire.

    To me it seems what takes most of the time is updating the result table.

    So in short, no fresh ideas! Anyone else?
    I didn't try using batches for the updated though.


    PS: Note that your pageSize setting is invalid (must be one of 4096,
    8192, 16384, or 32768) and Derby will silently ignore it and use the
    default...


-- Kristian


    [1] Note that client in this case still refers to the embedded
    driver, but the code composing the stored procedure is invoked from
    the driver side instead of "inside" the database.

Hello )))

I set pageSize to 32768, but result time seems near 11-12 sec.

Just to be sure, you did recreate the tables?
In any case, the page size would mostly help pull data in faster and that doesn't matter for this test.


What is the "Øysteins approach "? Can you write steps to get 2.4 seconds time?

Øysteins approach is using the query "insert into testtotals select client, sum(order_amount) from testbig group by client;". As you state, this is not what you want in your case and it might not be applicable.

I could also get down to these times by using a HashMap to store the intermediate totals in 'calculateTotalCommon'. This does of course use more memory and might cause trouble if you don't know the number of clients in your table (i.e. whether you need 25 thousand or 100 million entries in the map).


Thanks.


PS "To me it seems what takes most of the time is updating the result table." But what is the problem there? I commit data at the end, so DBMS should not do any writes ...

It seems what happens is that the log buffer goes full. By increasing the log buffer, I was able to get a little better performance. As always with tuning, it's about balance and tradeoffs. If your IO system is really good, maybe you can run with a big log buffer and get better performance. However, the effect you see from this also depends on how often you have commits (then the complete buffer is flushed anyway, at least in Derby).

So, in short, experiment with the following, using either the "insert into..." query or your client code modified to somehow store the totals in memory:
 a) Log buffer size
 b) Page cache size (and JVM heap)
 c) Page size

One of my attempts looked like this:
java -Xmx512M -Dderby.storage.pageSize=32768 -Dderby.storage.logBufferSize=524288 -Dderby.storage.pageCacheSize=2500 -cp .:${JDB10413} derbytest.FatTest

Using your original test code I haven't been able to get lower than around 5 seconds (best), the average being somewhere around 6 seconds.


As always, you have to do your own tests on your own system to see if it is good enough for your use :) Often there are other things to consider besides performance, for instance installation and ease of use.


Does anyone have any ideas on other possible tunings?


--
Kristian

Reply via email to