[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

Zhihong Yu (JIRA) Thu, 26 Apr 2012 07:56:41 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262652#comment-13262652
 ]


Zhihong Yu commented on HBASE-5611:
-----------------------------------

Patch v2 looks good in general.
Comment on formatting:
{code}
+   * @param regionName
+   *          region name.
{code}
The line length is 100 chars. Please put javadoc for param on the same line as 
param name.

You can wait for Hadoop QA result to come back before attaching new patches.
                
> Replayed edits from regions that failed to open during recovery aren't 
> removed from the global MemStore size
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5611
>                 URL: https://issues.apache.org/jira/browse/HBASE-5611
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.6
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jieshan Bean
>            Priority: Critical
>             Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
>         Attachments: HBASE-5611-94.patch, HBASE-5611-trunk-v2.patch, 
> HBASE-5611-trunk.patch
>
>
> This bug is rather easy to get if the {{TimeoutMonitor}} is on, else I think 
> it's still possible to hit it if a region fails to open for more obscure 
> reasons like HDFS errors.
> Consider a region that just went through distributed splitting and that's now 
> being opened by a new RS. The first thing it does is to read the recovery 
> files and put the edits in the {{MemStores}}. If this process takes a long 
> time, the master will move that region away. At that point the edits are 
> still accounted for in the global {{MemStore}} size but they are dropped when 
> the {{HRegion}} gets cleaned up. It's completely invisible until the 
> {{MemStoreFlusher}} needs to force flush a region and that none of them have 
> edits:
> {noformat}
> 2012-03-21 00:33:39,303 DEBUG 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up 
> because memory above low water=5.9g
> 2012-03-21 00:33:39,303 ERROR 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Cache flusher failed 
> for entry null
> java.lang.IllegalStateException
>         at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:129)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:199)
>         at 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:223)
>         at java.lang.Thread.run(Thread.java:662)
> {noformat}
> The {{null}} here is a region. In my case I had so many edits in the 
> {{MemStore}} during recovery that I'm over the low barrier although in fact 
> I'm at 0. It happened yesterday and it still printing this out.
> To fix this we need to be able to decrease the global {{MemStore}} size when 
> the region can't open.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5611) Replayed edits from regions that failed to open during recovery aren't removed from the global MemStore size

Reply via email to