[
https://issues.apache.org/jira/browse/HBASE-24292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17303612#comment-17303612
]
Nick Dimiduk commented on HBASE-24292:
--------------------------------------
I am of the opinion that at the very least, the master that cannot make
progress should release it's lock on the active status. Whether it does so by
aborting, I don't have a strong feeling.
[~zhangduo] and [~stack] have spent a lot of time in those part of the code, do
either of you have strong opinions on a way out of this infinite loop?
> A "stuck" master should not idle as active without taking action
> ----------------------------------------------------------------
>
> Key: HBASE-24292
> URL: https://issues.apache.org/jira/browse/HBASE-24292
> Project: HBase
> Issue Type: Bug
> Components: master, Region Assignment
> Affects Versions: 2.3.0
> Reporter: Nick Dimiduk
> Assignee: Rahul Kumar
> Priority: Critical
>
> The master schedules a SCP for the region server hosting meta. However, due
> to a misconfiguration, the cluster cannot make progress. After fixing the
> configuration issue and restarting, the cluster still cannot make progress.
> After the configured period (15 minuets), the master enters a "holding
> pattern" where it retains Active master status, but isn't taking any action.
> This "brown-out" state is toxic. It should either keep trying to make
> progress, or it should abort. Staying up and not doing anything is the wrong
> thing to do.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)