[ 
https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891281#comment-15891281
 ] 

Gary Helmling commented on HBASE-17704:
---------------------------------------

So HBASE-16209 added a backoff policy for retries of region open, without which 
regions would go into FAILED_OPEN quickly.  So maybe all that's needed is bump 
up the configuration for maximum attempts ("hbase.assignment.maximum.attempts") 
to Integer.MAX_VALUE?

> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> ---------------------------------------------------------
>
>                 Key: HBASE-17704
>                 URL: https://issues.apache.org/jira/browse/HBASE-17704
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.1.8
>            Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node 
> cluster. This lead to the regions which were present on the 6 RS which became 
> unavailable to be reassigned to live RSs. When attempting to open some of the 
> reassigned regions, some RS encountered missing blocks and issued "No live 
> nodes contain current block Block locations" putting the regions in state 
> FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in 
> FAILED_OPEN, needing a restart of all the affected RSs to solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to