Replayed edits from regions that failed to open during recovery aren't removed 
from the global MemStore size
------------------------------------------------------------------------------------------------------------

                 Key: HBASE-5611
                 URL: https://issues.apache.org/jira/browse/HBASE-5611
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.6
            Reporter: Jean-Daniel Cryans
            Priority: Critical
             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1


This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
it's still possible to hit it if a region fails to open for more obscure 
reasons like HDFS errors.

Consider a region that just went through distributed splitting and that's now 
being opened by a new RS. The first thing it does is to read the recovery files 
and put the edits in the {{MemStores}}. If this process takes a long time, the 
master will move that region away. At that point the edits are still accounted 
for in the global {{MemStore}} size but they are dropped when the {{HRegion}} 
gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs 
to force flush a region and that none of them have edits:

{noformat}
2012-03-21 00:33:39,303 DEBUG 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
because memory above low water=5.9g
2012-03-21 00:33:39,303 ERROR 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for 
entry null
java.lang.IllegalStateException
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
        at 
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

The {{null}} here is a region. In my case I had so many edits in the 
{{MemStore}} during recovery that I'm over the low barrier although in fact I'm 
at 0. It happened yesterday and it still printing this out.

To fix this we need to be able to decrease the global {{MemStore}} size when 
the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to