[ https://issues.apache.org/jira/browse/HBASE-17704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887093#comment-15887093 ]
Pankaj Kumar commented on HBASE-17704: -------------------------------------- [~apurtell], Can we have some chore service which will try to recover those regions who are in transition for longer duration (say > 10 min)? I feel, in some situation this chore service will be useful to reassign the regions which are stuck in FAILED_OPEN/FAILED_CLOSE state infinitely. Like in this JIRA scenario, even after some time DNs came up but HM couldn't reassign them. > Regions stuck in FAILED_OPEN when HDFS blocks are missing > --------------------------------------------------------- > > Key: HBASE-17704 > URL: https://issues.apache.org/jira/browse/HBASE-17704 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 1.1.8 > Reporter: Mathias Herberts > > We recently experienced the loss of a whole rack (6 DNs + RS) in a 120 node > cluster. This lead to the regions which were present on the 6 RS which became > unavailable to be reassigned to live RSs. When attempting to open some of the > reassigned regions, some RS encountered missing blocks and issued "No live > nodes contain current block Block locations" putting the regions in state > FAILED_OPEN. > Once the disappeared DNs went back online, the regions were left in > FAILED_OPEN, needing a restart of all the affected RSs to solve the problem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)