Replayed edits from regions that failed to open during recovery aren't removed
from the global MemStore size
------------------------------------------------------------------------------------------------------------
Key: HBASE-5611
URL: https://issues.apache.org/jira/browse/HBASE-5611
Project: HBase
Issue Type: Bug
Affects Versions: 0.90.6
Reporter: Jean-Daniel Cryans
Priority: Critical
Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think
it's still possible to hit it if a region fails to open for more obscure
reasons like HDFS errors.
Consider a region that just went through distributed splitting and that's now
being opened by a new RS. The first thing it does is to read the recovery files
and put the edits in the {{MemStores}}. If this process takes a long time, the
master will move that region away. At that point the edits are still accounted
for in the global {{MemStore}} size but they are dropped when the {{HRegion}}
gets cleaned up. It's completely invisible until the {{MemStoreFlusher}} needs
to force flush a region and that none of them have edits:
{noformat}
2012-03-21 00:33:39,303 DEBUG
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
because memory above low water=5.9g
2012-03-21 00:33:39,303 ERROR
org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed for
entry null
java.lang.IllegalStateException
at
com.google.common.base.Preconditions.checkState(Preconditions.java:129)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
at java.lang.Thread.run(Thread.java:662)
{noformat}
The {{null}} here is a region. In my case I had so many edits in the
{{MemStore}} during recovery that I'm over the low barrier although in fact I'm
at 0. It happened yesterday and it still printing this out.
To fix this we need to be able to decrease the global {{MemStore}} size when
the region can't open.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira