A word of warning: H2 level 2 cache, OOM, and corrupted indexes

Mikkel Kamstrup Erlandsen Mon, 07 Dec 2009 02:44:42 -0800

Hi,

After long hours of debugging some strange corruption issues with our
H2 databases, we finally nailed the problem and I want to share our
experiences here for the Googlers, Yahooers, and Bingers of the world
:-)

Our problem had almost exactly the same symptoms as:
http://groups.google.com/group/h2-database/browse_thread/thread/03735bf5b838a033/3e8623bcc7cda4eb#3e8623bcc7cda4eb
aka "Problem with SELECT ... ORDER BY no data, revisited! ". I could
pull out records by the string ids we assigned them, but once I tried
pulling them out ordered my the mtime column of the DB I got an empty
result set. This behavior was very hard to reproduce, but was
sometimes triggered after we ran a series of updates on the DB. We had
no errors in our logs and nothing in H2's .trace.db. Deleting the
.index.db file from the DB folder would restore the database index,
and the queries would work again. Obviously this was not acceptable
for production.

After trawling also the client side logs we discovered an
OutOfMemoryError. Even though it was in the client logs, it was in
fact a server side error. The trick is that RMI (yes, we use RMI for
our IPC) passes any olde Throwable in the server directly back to the
calling client. Also OutOfMemoryError (this is just one (of many)
reasons why RMI should not be used for mission critical systems (says
- the guy who does it anyway)). In the client we had a simple
try/catch(Throwable) around the call to the server to guard the client
against misbehaving servers. The client would only log server errors
as a warning, and the fatal server error drowned in the client logs
:-S

Ok. So now we had located the cause of the corrupted indexes, but wat
caused the DB to OOM? We had seen H2 run smoothly with only 128mb of
heap on 10M rows. The JVM was set to Xmx512m so we did not anticipate
memory issues. After spending some time memory-optimizing our internal
DB usage with only minimal gains we had to look elsewhere. When I
disabled the level 2 cache everything went back to blissful stability.
In the H2 docs it is stated that the L2 cache gives a performance
boost on large DBs, but a penalty on smaller ones. Since we figured
that 10M rows was "large" we had it enabled. However it turned out
that we got a ~50-100% performance boost _disabling_ the H2 L2 cache
:-)

I know that the L2 cache uses either Soft- or WeakReferences so the
JVM should free the cached refs automagically when it is under
pressure. Apparently it didn't. With VisualVM I could watch the heap
size grow and grow and grow until the JVM went belly up. Without the
L2 cache we are back to a steady memory usage of ~100mb - and ~50mb
after I manually trigger a full GC from VisualVM.

This whole post was really inspired from reading the H2 1.2.125
ChangeLog, which had the interesting point:

"The database is now closed after an out of memory exception,
because the database could get corrupt otherwise."

We could have used that a month ago :-)

Please don't mistake this post as a H2 rant. We love H2 and use it for
a lot of stuff here at the State and University Library of Denmark and
will continue to do so! Thanks for your rocking work Thomas!

--
Cheers,
Mikkel

You received this message because you are subscribed to the Google Groups "H2
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/h2-database?hl=en.

A word of warning: H2 level 2 cache, OOM, and corrupted indexes

Reply via email to