[
https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891281#comment-15891281
]
Gary Helmling commented on HBASE-17704:
---------------------------------------
So HBASE-16209 added a backoff policy for retries of region open, without which
regions would go into FAILED_OPEN quickly. So maybe all that's needed is bump
up the configuration for maximum attempts ("hbase.assignment.maximum.attempts")
to Integer.MAX_VALUE?
> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> ---------------------------------------------------------
>
> Key: HBASE-17704
> URL: https://issues.apache.org/jira/browse/HBASE-17704
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.1.8
> Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node
> cluster. This lead to the regions which were present on the 6 RS which became
> unavailable to be reassigned to live RSs. When attempting to open some of the
> reassigned regions, some RS encountered missing blocks and issued "No live
> nodes contain current block Block locations" putting the regions in state
> FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in
> FAILED_OPEN, needing a restart of all the affected RSs to solve the problem.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)