[
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261799#comment-13261799
]
Zhihong Yu commented on HBASE-5611:
-----------------------------------
{code}
+ // global memstore size once a region opening failed.
{code}
'region opening failed' -> 'region failed opening'.
{code}
+ private final ConcurrentMap<HRegionInfo, AtomicLong> replayEditsPerRegion =
{code}
Do we need HRegionInfo as the key to the Map ? Can we use region name ?
For rollbackRegionReplayEditsSize():
{code}
+ addAndGetGlobalMemstoreSize(-replayEdistsSize.get());
+ clearRegionReplayEditsSize(hri);
{code}
I suggest remembering the value of -replayEdistsSize.get() in a variable so
that we can exchange the order of the two statements above and return directly
from the if block.
If replayEdistsSize is null, would that indicate certain race condition ?
> Replayed edits from regions that failed to open during recovery aren't
> removed from the global MemStore size
> ------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-5611
> URL: https://issues.apache.org/jira/browse/HBASE-5611
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.6
> Reporter: Jean-Daniel Cryans
> Assignee: Jieshan Bean
> Priority: Critical
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-5611-trunk.patch
>
>
> This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think
> it's still possible to hit it if a region fails to open for more obscure
> reasons like HDFS errors.
> Consider a region that just went through distributed splitting and that's now
> being opened by a new RS. The first thing it does is to read the recovery
> files and put the edits in the {{MemStores}}. If this process takes a long
> time, the master will move that region away. At that point the edits are
> still accounted for in the global {{MemStore}} size but they are dropped when
> the {{HRegion}} gets cleaned up. It's completely invisible until the
> {{MemStoreFlusher}} needs to force flush a region and that none of them have
> edits:
> {noformat}
> 2012-03-21 00:33:39,303 DEBUG
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up
> because memory above low water=5.9g
> 2012-03-21 00:33:39,303 ERROR
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed
> for entry null
> java.lang.IllegalStateException
> at
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
> at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The {{null}} here is a region. In my case I had so many edits in the
> {{MemStore}} during recovery that I'm over the low barrier although in fact
> I'm at 0. It happened yesterday and it still printing this out.
> To fix this we need to be able to decrease the global {{MemStore}} size when
> the region can't open.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira