John Embretsen wrote:
Kristian Waagan wrote:
John Embretsen wrote:
Hi Ture,
[ snip ]
I didn't think it was this easy to fill up the heap (by not
explicitly closing Statement objects) anymore (see e.g. DERBY-210),
but there are obviously some vulnerabilities left. So thank you for
sharing your code and for reporting this!
Hi John,
I think the reason why the OOME happened so fast in this application
was because of the sorting (ORDER BY). I don't know enough about how
sorting is implemented in Derby to have an opinion, but I think we
should have a look at it.
I will do a little pre-investigation and create a Jira if appropriate.
OK, thanks! From quickly reading the thread I was under the impression
that adding a call to Statement.close() solved the problem, but I guess
that was only part of it.
One of the DOTS tests I've been running also stumbled across a memory
leak related to sorting, see
http://issues.apache.org/jira/browse/DERBY-2176 . Interesting
coincidence. That was fixed with 10.2.2.0, though, so this must be
something else.
Hi again,
After a quick look at the repro, I have concluded that Derby does not
have a bug in this area.
That said, it might be possible to code very defensively and try to make
up for the lack of explicit closing of resources (statements and
resultsets). I believe this works for the embedded case, due to garbage
collection. For this to work in c/s, the server must somehow be notified
that the objects on the client side has been gc'ed.
Regarding the repro, only one change is needed to make it complete the
loading; close the resultset 'rs' inside the loop. It might or might not
work when you remove the reconnection-logic, depending on how much
resources are tied to the statement-object on the server side and how
big the heap is.
The sort-resources are tied to the resultset.
Even better, close the statement or use a prepared statement.
Regarding the reduced insertion performance over time, this was simply
caused by a missing index. By adding an index for the column 'name' in
Quaternary, the times look like this on my computer:
Connecting to the database
5000 rows inserted (8s)
10000 rows inserted (5s)
15000 rows inserted (5s)
20000 rows inserted (5s)
25000 rows inserted (5s)
30000 rows inserted (5s)
35000 rows inserted (5s)
40000 rows inserted (5s)
45000 rows inserted (5s)
50000 rows inserted (5s)
55000 rows inserted (5s)
60000 rows inserted (5s)
real 1m11.429s
user 0m13.909s
sys 0m2.606s
Now that's a nice performance increase :) (from ~30 to ~1 minute).
And you pay for it by increased storage cost (for the index). With the
index the database doesn't have to scan the whole table on each insert
to answer the 'SELECT id FROM Quaternary WHERE name=?'-query.
--
Kristian