[ 
https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975295#comment-13975295
 ] 

Jimmy Xiang commented on HBASE-9740:
------------------------------------

That's right. In 96+, the region will be moved to failed_open state. OPS/admin 
needs to investigate it, fix the problem, assign the region again.  We was 
talking about showing the problem on the master web UI, but hasn't done it yet.

> A corrupt HFile could cause endless attempts to assign the region without a 
> chance of success
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9740
>                 URL: https://issues.apache.org/jira/browse/HBASE-9740
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.16
>            Reporter: Aditya Kishore
>            Assignee: Ping
>             Fix For: 0.94.19
>
>         Attachments: HBase-9740_0.94_v4.patch, HBase-9749_0.94_v2.patch, 
> HBase-9749_0.94_v3.patch, patch-9740_0.94.txt
>
>
> As described in HBASE-9737, a corrupt HFile in a region could lead to an 
> assignment storm in the cluster since the Master will keep trying to assign 
> the region to each region server one after another and obviously none will 
> succeed.
> The region server, upon detecting such a scenario should mark the region as 
> "RS_ZK_REGION_FAILED_ERROR" (or something to the effect) in the Zookeeper 
> which should indicate the Master to stop assigning the region until the error 
> has been resolved (via an HBase shell command, probably "assign"?)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to