[jira] [Commented] (HBASE-20087) Periodically attempt redeploy of regions in FAILED_OPEN state

Andrew Purtell (JIRA) Mon, 26 Feb 2018 11:05:37 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-20087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16377379#comment-16377379
 ]


Andrew Purtell commented on HBASE-20087:
----------------------------------------

On the other hand, preserving today's behavior of transitioning region state to 
FAILED_OPEN makes sense, assuming the reasons which went in to developing this 
strategy are still valid. If we assume that, but still want to achieve the 
stated aim of this JIRA, then moving that chore into the AM, using its 
scheduled executor service to drive it, isn't a bad idea. 

> Periodically attempt redeploy of regions in FAILED_OPEN state
> -------------------------------------------------------------
>
>                 Key: HBASE-20087
>                 URL: https://issues.apache.org/jira/browse/HBASE-20087
>             Project: HBase
>          Issue Type: Improvement
>          Components: master, Region Assignment
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>            Priority: Major
>             Fix For: 2.0.0, 1.5.0
>
>         Attachments: 
> 0001-W-4723090-Port-the-RIT-FAILED_OPEN-state-hack-from-R.patch
>
>
> Because RSGroups can cause permanent RIT with regions in FAILED_OPEN state, 
> we added logic to the master portion of the RSGroups extention to enumerate 
> RITs and retry assignment of regions in FAILED_OPEN state.
> However, this strategy can be applied generally to reduce need of operator 
> involvement in cluster operations. Now an operator has to manually resolve 
> FAILED_OPEN assignments but there is little risk in automatically retrying 
> them after a while. If the reason the assignment failed has not cleared, the 
> assignment will just fail again. Should the reason the assignment failed be 
> resolved, then operators don't have to do more in order for the cluster to 
> fully heal. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20087) Periodically attempt redeploy of regions in FAILED_OPEN state

Reply via email to