Bryan Pendleton wrote:
database. As the data are published scientific data I can share my
source code
and the data. Everything is available on
www.fysik.dtu.dk/~munter/derby/
Thanks for sharing your sample program. It looks like you are creating
statements, but never closing them.
Each call to cDerby.createStatement() should be matched with a call to
close()
on the statement.
Ture,
As Bryan says, you should close your statements. Even better would be to
use a PreparedStatement for all you queries (including those without any
variables).
Unless it is an application requirement, you could also do with just one
table and use the (Prepared)Statement.getGeneratedKeys() if you need to
obtain the unique identifier after insertion. Something like this:
// Note that the first marker (?) is replaced with DEFAULT.
dataInsert = cDerby.prepareStatement(
"INSERT INTO Quaternary VALUES (
DEFAULT, ?, CURRENT_TIMESTAMP, ?, ?, ?, ?, ?, ?)",
Statement.RETURN_GENERATED_KEYS);
...
affectedRows = dataInsert.executeUpdate();
ResultSet rs = dataInsert.getGeneratedKeys();
rs.next();
int sysid = rs.getInt(1);
rs.close();
Don't know how important it is anymore (with newer Java versions), but
you might see a little improvement by using 'Float.valueOf(strings[3])'
instead of 'new Float(strings[3].floatValue())'.
Further, you should not have to reconnect after a few hundred
insertions. If you have to, it probably means one out of two things; the
application code is not optimal, or there is a bug in Derby.
In this case, I *guess* that not closing the statements caused the heap
to fill up. Particularly, the query is sorting the data with an ORDER
BY. The excerpt of the following heap histogram tells the story:
num #instances #bytes class name
--------------------------------------
1: 1360547 54421880 org.apache.derby.impl.store.access.sort.Node
2: 293 6282608 [Lorg.apache.derby.impl.store.access.sort.Node;
3: 33641 3463176 <constMethodKlass>
4: 33641 2696744 <methodKlass>
5: 4919 2097344 [B
6: 47123 1986192 <symbolKlass>
7: 2547 1468640 <constantPoolKlass>
But, where are all the statement objects?
I think this case deserves a little more investigation to make sure we
don't have a leak related to the Node-objects in Derby.
Can anyone shed some light on this based on previous experience?
Also, the reduced insertion rate can easily be observed with the repro.
The number of rows is accumulated, the duration is not (i.e. the
durations printed are all for inserting 5000 rows). The numbers below
are from a run where a commit is done every 5000 rows, which turned out
to be lightly worse than every 50 rows (clocked in at 28m36s):
Connecting to the database
5000 rows inserted (23s)
10000 rows inserted (38s)
15000 rows inserted (54s)
20000 rows inserted (77s)
25000 rows inserted (93s)
30000 rows inserted (104s)
35000 rows inserted (127s)
40000 rows inserted (162s)
45000 rows inserted (186s)
50000 rows inserted (208s)
55000 rows inserted (234s)
60000 rows inserted (279s)
real 30m38.104s
user 0m20.898s
sys 0m4.452s
As can be seen, the time it takes to insert 5000 rows rises from 23
seconds at startup to over nearly 300 seconds (5 minutes). This has to
be investigated as well. I'll see if I can have a look soon, but anyone
else is free to check it out. If I get around to it, I'll modify the
script slightly and create a Jira.
Thanks Ture for reporting this.
regards,
--
Kristian
thanks,
bryan