John Embretsen wrote:
Kristian Waagan wrote:
John Embretsen wrote:
Hi Ture,

[ snip ]

I didn't think it was this easy to fill up the heap (by not explicitly closing Statement objects) anymore (see e.g. DERBY-210), but there are obviously some vulnerabilities left. So thank you for sharing your code and for reporting this!

Hi John,

I think the reason why the OOME happened so fast in this application was because of the sorting (ORDER BY). I don't know enough about how sorting is implemented in Derby to have an opinion, but I think we should have a look at it.

I will do a little pre-investigation and create a Jira if appropriate.

OK, thanks! From quickly reading the thread I was under the impression that adding a call to Statement.close() solved the problem, but I guess that was only part of it.

One of the DOTS tests I've been running also stumbled across a memory leak related to sorting, see http://issues.apache.org/jira/browse/DERBY-2176 . Interesting coincidence. That was fixed with 10.2.2.0, though, so this must be something else.

Hi again,

After a quick look at the repro, I have concluded that Derby does not have a bug in this area. That said, it might be possible to code very defensively and try to make up for the lack of explicit closing of resources (statements and resultsets). I believe this works for the embedded case, due to garbage collection. For this to work in c/s, the server must somehow be notified that the objects on the client side has been gc'ed.

Regarding the repro, only one change is needed to make it complete the loading; close the resultset 'rs' inside the loop. It might or might not work when you remove the reconnection-logic, depending on how much resources are tied to the statement-object on the server side and how big the heap is.
The sort-resources are tied to the resultset.
Even better, close the statement or use a prepared statement.

Regarding the reduced insertion performance over time, this was simply caused by a missing index. By adding an index for the column 'name' in Quaternary, the times look like this on my computer:
Connecting to the database
5000 rows inserted (8s)
10000 rows inserted (5s)
15000 rows inserted (5s)
20000 rows inserted (5s)
25000 rows inserted (5s)
30000 rows inserted (5s)
35000 rows inserted (5s)
40000 rows inserted (5s)
45000 rows inserted (5s)
50000 rows inserted (5s)
55000 rows inserted (5s)
60000 rows inserted (5s)

real    1m11.429s
user    0m13.909s
sys     0m2.606s

Now that's a nice performance increase :) (from ~30 to ~1 minute).
And you pay for it by increased storage cost (for the index). With the index the database doesn't have to scan the whole table on each insert to answer the 'SELECT id FROM Quaternary WHERE name=?'-query.



--
Kristian




Reply via email to