[ 
https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975295#comment-13975295
 ] 

Jimmy Xiang edited comment on HBASE-9740 at 4/20/14 10:20 PM:
--------------------------------------------------------------

That's right. In 96+, the region will be moved to failed_open state. OPS/admin 
needs to investigate it, fix the problem, assign the region again.  We were 
talking about showing the problem on the master web UI, but have not done it 
yet.


was (Author: jxiang):
That's right. In 96+, the region will be moved to failed_open state. OPS/admin 
needs to investigate it, fix the problem, assign the region again.  We was 
talking about showing the problem on the master web UI, but hasn't done it yet.

> A corrupt HFile could cause endless attempts to assign the region without a 
> chance of success
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9740
>                 URL: https://issues.apache.org/jira/browse/HBASE-9740
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.16
>            Reporter: Aditya Kishore
>            Assignee: Ping
>             Fix For: 0.94.19
>
>         Attachments: HBase-9740_0.94_v4.patch, HBase-9749_0.94_v2.patch, 
> HBase-9749_0.94_v3.patch, patch-9740_0.94.txt
>
>
> As described in HBASE-9737, a corrupt HFile in a region could lead to an 
> assignment storm in the cluster since the Master will keep trying to assign 
> the region to each region server one after another and obviously none will 
> succeed.
> The region server, upon detecting such a scenario should mark the region as 
> "RS_ZK_REGION_FAILED_ERROR" (or something to the effect) in the Zookeeper 
> which should indicate the Master to stop assigning the region until the error 
> has been resolved (via an HBase shell command, probably "assign"?)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to