Re: Speed of using Derby DB

Kristian Waagan Thu, 19 Jun 2008 05:52:47 -0700

vodarus vodarus wrote:

Hi,


    By using Øysteins approach I was able to get the time down to 2.4
    seconds on my machine, on which the client [1] and stored procedure
    code took around 12 seconds. The best I could get on the latter,
    tweaking page cache size and page size, was around 8 seconds.

    By cheating and removing some durability guarantees, I got down to a
    best time (not quite stable) of 1.5 seconds using Øysteins suggestion.

    I was surprised of the high disk activity seen when running the
    code. Lots of writes are taking place, which I did not quite expect
    for Øysteins query. But I do not know the implementation or the
    algorithm being used.

    There also seem to be some overhead invoking a stored procedure, as
    the client [1] code is faster. This would of course look different
    if the network JDBC driver was used, as you wouldn't have to
    transfer the data over the wire.

    To me it seems what takes most of the time is updating the result table.

    So in short, no fresh ideas! Anyone else?
    I didn't try using batches for the updated though.


    PS: Note that your pageSize setting is invalid (must be one of 4096,
    8192, 16384, or 32768) and Derby will silently ignore it and use the
    default...

--Kristian



    [1] Note that client in this case still refers to the embedded
    driver, but the code composing the stored procedure is invoked from
    the driver side instead of "inside" the database.

Hello )))

I set pageSize to 32768, but result time seems near 11-12 sec.


Just to be sure, you did recreate the tables?

In any case, the page size would mostly help pull data in faster andthat doesn't matter for this test.

What is the "Øysteins approach "? Can you write steps to get 2.4 secondstime?

Øysteins approach is using the query "insert into testtotals selectclient, sum(order_amount) from testbig group by client;".As you state, this is not what you want in your case and it might not beapplicable.

I could also get down to these times by using a HashMap to store theintermediate totals in 'calculateTotalCommon'. This does of course usemore memory and might cause trouble if you don't know the number ofclients in your table (i.e. whether you need 25 thousand or 100 millionentries in the map).

Thanks.
PS "To me it seems what takes most of the time is updating the resulttable." But what is the problem there? I commit data at the end, so DBMSshould not do any writes ...

It seems what happens is that the log buffer goes full. By increasingthe log buffer, I was able to get a little better performance. As alwayswith tuning, it's about balance and tradeoffs. If your IO system isreally good, maybe you can run with a big log buffer and get betterperformance. However, the effect you see from this also depends on howoften you have commits (then the complete buffer is flushed anyway, atleast in Derby).

So, in short, experiment with the following, using either the "insertinto..." query or your client code modified to somehow store the totalsin memory:

 a) Log buffer size
 b) Page cache size (and JVM heap)
 c) Page size

One of my attempts looked like this:

java -Xmx512M -Dderby.storage.pageSize=32768-Dderby.storage.logBufferSize=524288 -Dderby.storage.pageCacheSize=2500-cp .:${JDB10413} derbytest.FatTest

Using your original test code I haven't been able to get lower thanaround 5 seconds (best), the average being somewhere around 6 seconds.

As always, you have to do your own tests on your own system to see if itis good enough for your use :)Often there are other things to consider besides performance, forinstance installation and ease of use.



Does anyone have any ideas on other possible tunings?


--
Kristian

Re: Speed of using Derby DB

Reply via email to