Hi, After long hours of debugging some strange corruption issues with our H2 databases, we finally nailed the problem and I want to share our experiences here for the Googlers, Yahooers, and Bingers of the world :-)
Our problem had almost exactly the same symptoms as: http://groups.google.com/group/h2-database/browse_thread/thread/03735bf5b838a033/3e8623bcc7cda4eb#3e8623bcc7cda4eb aka "Problem with SELECT ... ORDER BY no data, revisited! ". I could pull out records by the string ids we assigned them, but once I tried pulling them out ordered my the mtime column of the DB I got an empty result set. This behavior was very hard to reproduce, but was sometimes triggered after we ran a series of updates on the DB. We had no errors in our logs and nothing in H2's .trace.db. Deleting the .index.db file from the DB folder would restore the database index, and the queries would work again. Obviously this was not acceptable for production. After trawling also the client side logs we discovered an OutOfMemoryError. Even though it was in the client logs, it was in fact a server side error. The trick is that RMI (yes, we use RMI for our IPC) passes any olde Throwable in the server directly back to the calling client. Also OutOfMemoryError (this is just one (of many) reasons why RMI should not be used for mission critical systems (says - the guy who does it anyway)). In the client we had a simple try/catch(Throwable) around the call to the server to guard the client against misbehaving servers. The client would only log server errors as a warning, and the fatal server error drowned in the client logs :-S Ok. So now we had located the cause of the corrupted indexes, but wat caused the DB to OOM? We had seen H2 run smoothly with only 128mb of heap on 10M rows. The JVM was set to Xmx512m so we did not anticipate memory issues. After spending some time memory-optimizing our internal DB usage with only minimal gains we had to look elsewhere. When I disabled the level 2 cache everything went back to blissful stability. In the H2 docs it is stated that the L2 cache gives a performance boost on large DBs, but a penalty on smaller ones. Since we figured that 10M rows was "large" we had it enabled. However it turned out that we got a ~50-100% performance boost _disabling_ the H2 L2 cache :-) I know that the L2 cache uses either Soft- or WeakReferences so the JVM should free the cached refs automagically when it is under pressure. Apparently it didn't. With VisualVM I could watch the heap size grow and grow and grow until the JVM went belly up. Without the L2 cache we are back to a steady memory usage of ~100mb - and ~50mb after I manually trigger a full GC from VisualVM. This whole post was really inspired from reading the H2 1.2.125 ChangeLog, which had the interesting point: "The database is now closed after an out of memory exception, because the database could get corrupt otherwise." We could have used that a month ago :-) Please don't mistake this post as a H2 rant. We love H2 and use it for a lot of stuff here at the State and University Library of Denmark and will continue to do so! Thanks for your rocking work Thomas! -- Cheers, Mikkel -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
