[
https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887093#comment-15887093
]
Pankaj Kumar commented on HBASE-17704:
--------------------------------------
[~apurtell], Can we have some chore service which will try to recover those
regions who are in transition for longer duration (say > 10 min)?
I feel, in some situation this chore service will be useful to reassign the
regions which are stuck in FAILED_OPEN/FAILED_CLOSE state infinitely.
Like in this JIRA scenario, even after some time DNs came up but HM couldn't
reassign them.
> Regions stuck in FAILED_OPEN when HDFS blocks are missing
> ---------------------------------------------------------
>
> Key: HBASE-17704
> URL: https://issues.apache.org/jira/browse/HBASE-17704
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.1.8
> Reporter: Mathias Herberts
>
> We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node
> cluster. This lead to the regions which were present on the 6 RS which became
> unavailable to be reassigned to live RSs. When attempting to open some of the
> reassigned regions, some RS encountered missing blocks and issued "No live
> nodes contain current block Block locations" putting the regions in state
> FAILED_OPEN.
> Once the disappeared DNs went back online, the regions were left in
> FAILED_OPEN, needing a restart of all the affected RSs to solve the problem.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)