Re: Inserting data into a database on a Derby network server

Kristian Waagan Mon, 08 Oct 2007 11:32:32 -0700

Bryan Pendleton wrote:

database. As the data are published scientific data I can share mysource code
and the data. Everything is available on
www.fysik.dtu.dk/~munter/derby/
Thanks for sharing your sample program. It looks like you are creating
statements, but never closing them.
Each call to cDerby.createStatement() should be matched with a call toclose()
on the statement.


Ture,

As Bryan says, you should close your statements. Even better would be touse a PreparedStatement for all you queries (including those without anyvariables).

Unless it is an application requirement, you could also do with just onetable and use the (Prepared)Statement.getGeneratedKeys() if you need toobtain the unique identifier after insertion. Something like this:

  // Note that the first marker (?) is replaced with DEFAULT.
  dataInsert = cDerby.prepareStatement(
     "INSERT INTO Quaternary VALUES (
          DEFAULT, ?, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?)",
      Statement.RETURN_GENERATED_KEYS);
  ...
  affectedRows = dataInsert.executeUpdate();
  ResultSet rs = dataInsert.getGeneratedKeys();
  rs.next();
  int sysid = rs.getInt(1);
  rs.close();

Don't know how important it is anymore (with newer Java versions), butyou might see a little improvement by using 'Float.valueOf(strings[3])'instead of 'new Float(strings[3].floatValue())'.

Further, you should not have to reconnect after a few hundredinsertions. If you have to, it probably means one out of two things; theapplication code is not optimal, or there is a bug in Derby.In this case, I *guess* that not closing the statements caused the heapto fill up. Particularly, the query is sorting the data with an ORDERBY. The excerpt of the following heap histogram tells the story:


num   #instances    #bytes  class name
--------------------------------------
  1:   1360547    54421880  org.apache.derby.impl.store.access.sort.Node
  2:       293     6282608  [Lorg.apache.derby.impl.store.access.sort.Node;
  3:     33641     3463176  <constMethodKlass>
  4:     33641     2696744  <methodKlass>
  5:      4919     2097344  [B
  6:     47123     1986192  <symbolKlass>
  7:      2547     1468640  <constantPoolKlass>

But, where are all the statement objects?

I think this case deserves a little more investigation to make sure wedon't have a leak related to the Node-objects in Derby.

Can anyone shed some light on this based on previous experience?


Also, the reduced insertion rate can easily be observed with the repro.

The number of rows is accumulated, the duration is not (i.e. thedurations printed are all for inserting 5000 rows). The numbers beloware from a run where a commit is done every 5000 rows, which turned outto be lightly worse than every 50 rows (clocked in at 28m36s):

Connecting to the database
5000 rows inserted (23s)
10000 rows inserted (38s)
15000 rows inserted (54s)
20000 rows inserted (77s)
25000 rows inserted (93s)
30000 rows inserted (104s)
35000 rows inserted (127s)
40000 rows inserted (162s)
45000 rows inserted (186s)
50000 rows inserted (208s)
55000 rows inserted (234s)
60000 rows inserted (279s)

real    30m38.104s
user    0m20.898s
sys     0m4.452s

As can be seen, the time it takes to insert 5000 rows rises from 23seconds at startup to over nearly 300 seconds (5 minutes). This has tobe investigated as well. I'll see if I can have a look soon, but anyoneelse is free to check it out. If I get around to it, I'll modify thescript slightly and create a Jira.



Thanks Ture for reporting this.


regards,
--
Kristian


thanks,

bryan

Re: Inserting data into a database on a Derby network server

Reply via email to