We are using H2 in production with large data volume and frequent 
reads/writes. And the system ran very slow at certain point. We took a 
thread dump and found the process spends most of the time in putting pages 
in the cache. A sample thread dump is shown below

   java.lang.Thread.State: RUNNABLE
         at org.h2.util.CacheLRU.removeOld(CacheLRU.java:186)
         at org.h2.util.CacheLRU.removeOldIfRequired(CacheLRU.java:138)
         at org.h2.util.CacheLRU.put(CacheLRU.java:113)
         at org.h2.store.PageStore.getPage(PageStore.java:829)
         - locked <86aca09> (a org.h2.store.PageStore)
         at org.h2.index.PageDataIndex.getPage(PageDataIndex.java:231)
         at org.h2.index.PageDataNode.getNextPage(PageDataNode.java:231)
         at org.h2.index.PageDataLeaf.getNextPage(PageDataLeaf.java:396)
         at org.h2.index.PageDataCursor.nextRow(PageDataCursor.java:93)
         at org.h2.index.PageDataCursor.next(PageDataCursor.java:52)
         at org.h2.index.IndexCursor.next(IndexCursor.java:241)
         at org.h2.table.TableFilter.next(TableFilter.java:359)
         at org.h2.command.dml.Select.queryFlat(Select.java:518)
         at org.h2.command.dml.Select.queryWithoutCache(Select.java:625)
         at org.h2.command.dml.Query.query(Query.java:314)
         at org.h2.command.dml.Query.query(Query.java:284)
         at org.h2.command.dml.Query.query(Query.java:36)
         at org.h2.command.CommandContainer.query(CommandContainer.java:91)
         at org.h2.command.Command.executeQuery(Command.java:195)
         - locked <1ef2e1b> (a org.h2.engine.Database)
         at 
org.h2.jdbc.JdbcPreparedStatement.executeQuery(JdbcPreparedStatement.java:106)
         - locked <1af61930> (a org.h2.engine.Session)
         at 
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
         at 
org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:93)
....

After digging the source code for a while, we have come to a conclusion 
that the issue is most likely caused by PageStore's changeCount being an 
int and the value has flipped into negative territory. Our theory is that

in org.h2.util.CacheLRU.removeOld

           if (!check.canRemove()) {
                removeFromLinkedList(check);
                addToFront(check);
                continue;
            }

the above code will check if the current cached object can be removed, and 
since the object is a page, it in turn calls


org.h2.index.PageData 


    public boolean canRemove() {
        if (changeCount >= index.getPageStore().getChangeCount()) {
            return false;
        }
        return true;
    }


here it compares the page's changeCount with the PageStore's changeCount. 
Both of these variables are of int type. And since we have very frequent 
transaction updates the PageStore's changeCount grows rapidly and 
eventually exceeds 2^31 and becomes negative. This results in the above 
PageData.canRemove always return false. And since we have a relatively 
large cache with more than 200k pages, CacheLRU's removeOld method 
basically loops through the cache object list and find nothing removable so 
it just burns the CPU. What makes it worse is that we have one particular 
SQL that loads more than 1million records from H2, and when that SQL is 
executed, we are running the removeOld method 1 million times (in the worst 
case), each time checking the 200k pages in the cache and ended up with 
nothing but just waste CPU. Worse; since this method locks up the 
Database/PageStore, all other threads are being blocked.

Please confirm our theory is correct, and we'd really appreciate a quick 
fix on this. For now, we have to restart our system every night so 
PageStore's changeCount wound not turn negative. 

A possible 'fix' might be changing the changeCount in PageStore/PageData to 
long? Of course, this is still not the perfect fix, but I guess for most 
mundane systems we will not get to see a long value turning negative before 
its EOL or a scheduled restart

Thanks

Ye

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database?hl=en-US.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to