[
https://issues.apache.org/jira/browse/HBASE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622743#comment-14622743
]
Jerry He commented on HBASE-14000:
----------------------------------
Okay. Got it. You got an interesting race condition.
HM1 was on the ZK as master, but lost to HM2 after the cluster start.
The expiration of the first HM1 master znode happened after the rs grabbed its
address, but before HM2's win.
> Region server failed to report Master and stuck in reportForDuty retry loop
> ---------------------------------------------------------------------------
>
> Key: HBASE-14000
> URL: https://issues.apache.org/jira/browse/HBASE-14000
> Project: HBase
> Issue Type: Bug
> Reporter: Pankaj Kumar
> Assignee: Pankaj Kumar
> Attachments: HBASE-14000.patch, HM_RS-Log_snippet.txt
>
>
> In a HA cluster, region server got stuck in reportForDuty retry loop if the
> active master is restarting and later on master switch happens before it
> reports successfully.
> Root cause is same as HBASE-13317, but the region server tried to connect
> master when it was starting, so rssStub reset didnt happen as
> {code}
> if (ioe instanceof ServerNotRunningYetException) {
> LOG.debug("Master is not running yet");
> }
> {code}
> When master starts, master switch happened. So RS always tried to connect to
> standby master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)