We are using H2 in production with large data volume and frequent
reads/writes. And the system ran very slow at certain point. We took a
thread dump and found the process spends most of the time in putting pages
in the cache. A sample thread dump is shown below
java.lang.Thread.State: RUNNABLE
at org.h2.util.CacheLRU.removeOld(CacheLRU.java:186)
at org.h2.util.CacheLRU.removeOldIfRequired(CacheLRU.java:138)
at org.h2.util.CacheLRU.put(CacheLRU.java:113)
at org.h2.store.PageStore.getPage(PageStore.java:829)
- locked <86aca09> (a org.h2.store.PageStore)
at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:231)
at org.h2.index.PageDataNode.getNextPage(PageDataNode.java:231)
at org.h2.index.PageDataLeaf.getNextPage(PageDataLeaf.java:396)
at org.h2.index.PageDataCursor.nextRow(PageDataCursor.java:93)
at org.h2.index.PageDataCursor.next(PageDataCursor.java:52)
at org.h2.index.IndexCursor.next(IndexCursor.java:241)
at org.h2.table.TableFilter.next(TableFilter.java:359)
at org.h2.command.dml.Select.queryFlat(Select.java:518)
at org.h2.command.dml.Select.queryWithoutCache(Select.java:625)
at org.h2.command.dml.Query.query(Query.java:314)
at org.h2.command.dml.Query.query(Query.java:284)
at org.h2.command.dml.Query.query(Query.java:36)
at org.h2.command.CommandContainer.query(CommandContainer.java:91)
at org.h2.command.Command.executeQuery(Command.java:195)
- locked <1ef2e1b> (a org.h2.engine.Database)
at
org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:106)
- locked <1af61930> (a org.h2.engine.Session)
at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
at
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
....
After digging the source code for a while, we have come to a conclusion
that the issue is most likely caused by PageStore's changeCount being an
int and the value has flipped into negative territory. Our theory is that
in org.h2.util.CacheLRU.removeOld
if (!check.canRemove()) {
removeFromLinkedList(check);
addToFront(check);
continue;
}
the above code will check if the current cached object can be removed, and
since the object is a page, it in turn calls
org.h2.index.PageData
public boolean canRemove() {
if (changeCount >= index.getPageStore().getChangeCount()) {
return false;
}
return true;
}
here it compares the page's changeCount with the PageStore's changeCount.
Both of these variables are of int type. And since we have very frequent
transaction updates the PageStore's changeCount grows rapidly and
eventually exceeds 2^31 and becomes negative. This results in the above
PageData.canRemove always return false. And since we have a relatively
large cache with more than 200k pages, CacheLRU's removeOld method
basically loops through the cache object list and find nothing removable so
it just burns the CPU. What makes it worse is that we have one particular
SQL that loads more than 1million records from H2, and when that SQL is
executed, we are running the removeOld method 1 million times (in the worst
case), each time checking the 200k pages in the cache and ended up with
nothing but just waste CPU. Worse; since this method locks up the
Database/PageStore, all other threads are being blocked.
Please confirm our theory is correct, and we'd really appreciate a quick
fix on this. For now, we have to restart our system every night so
PageStore's changeCount wound not turn negative.
A possible 'fix' might be changing the changeCount in PageStore/PageData to
long? Of course, this is still not the perfect fix, but I guess for most
mundane systems we will not get to see a long value turning negative before
its EOL or a scheduled restart
Thanks
Ye
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.