Re: Inserting data into a database on a Derby network server

Kristian Waagan Tue, 09 Oct 2007 06:00:12 -0700

John Embretsen wrote:

Kristian Waagan wrote:
John Embretsen wrote:
Hi Ture,
[ snip ]
I didn't think it was this easy to fill up the heap (by notexplicitly closing Statement objects) anymore (see e.g. DERBY-210),but there are obviously some vulnerabilities left. So thank you forsharing your code and for reporting this!
Hi John,
I think the reason why the OOME happened so fast in this applicationwas because of the sorting (ORDER BY). I don't know enough about howsorting is implemented in Derby to have an opinion, but I think weshould have a look at it.
I will do a little pre-investigation and create a Jira if appropriate.
OK, thanks! From quickly reading the thread I was under the impressionthat adding a call to Statement.close() solved the problem, but I guessthat was only part of it.
One of the DOTS tests I've been running also stumbled across a memoryleak related to sorting, seehttp://issues.apache.org/jira/browse/DERBY-2176 . Interestingcoincidence. That was fixed with 10.2.2.0, though, so this must besomething else.


Hi again,

After a quick look at the repro, I have concluded that Derby does nothave a bug in this area.That said, it might be possible to code very defensively and try to makeup for the lack of explicit closing of resources (statements andresultsets). I believe this works for the embedded case, due to garbagecollection. For this to work in c/s, the server must somehow be notifiedthat the objects on the client side has been gc'ed.

Regarding the repro, only one change is needed to make it complete theloading; close the resultset 'rs' inside the loop. It might or might notwork when you remove the reconnection-logic, depending on how muchresources are tied to the statement-object on the server side and howbig the heap is.

The sort-resources are tied to the resultset.
Even better, close the statement or use a prepared statement.

Regarding the reduced insertion performance over time, this was simplycaused by a missing index. By adding an index for the column 'name' inQuaternary, the times look like this on my computer:

Connecting to the database
5000 rows inserted (8s)
10000 rows inserted (5s)
15000 rows inserted (5s)
20000 rows inserted (5s)
25000 rows inserted (5s)
30000 rows inserted (5s)
35000 rows inserted (5s)
40000 rows inserted (5s)
45000 rows inserted (5s)
50000 rows inserted (5s)
55000 rows inserted (5s)
60000 rows inserted (5s)

real    1m11.429s
user    0m13.909s
sys     0m2.606s

Now that's a nice performance increase :) (from ~30 to ~1 minute).

And you pay for it by increased storage cost (for the index). With theindex the database doesn't have to scan the whole table on each insertto answer the 'SELECT id FROM Quaternary WHERE name=?'-query.




--
Kristian

Re: Inserting data into a database on a Derby network server

Reply via email to