[
https://issues.apache.org/jira/browse/HBASE-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-10345.
-------------------------------
Resolution: Later
Now after we make use the master local region, the old master can not even
process a region assignment as it can not store procedures any more, so the
problem is less hurt now. Resolve for now. Feel free to reopen if you have new
ideas.
Thanks.
> HMaster should not serve when disconnected with ZooKeeper
> ---------------------------------------------------------
>
> Key: HBASE-10345
> URL: https://issues.apache.org/jira/browse/HBASE-10345
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.94.3
> Reporter: chendihao
> Priority: Major
>
> Refer to HBASE-9468(Previous active master can still serves RPC request when
> it is trying recovering expired zk session), we can fail fast to avoid
> existing double masters at the same time. But this problem may occur before
> session expired. When receive Disconnected event, we can't make sure of that
> this active master can communicate with zk later. And it doesn't know whether
> backup master has become the new active master or not until it receives
> Expired event(which may lose forever). During this
> unsure-who-is-active-master period, the current active master should not
> serve(maybe turn off RpcServer).
> Here is the statement from "ZooKeeper Distributed Process Coordination" P101
> {quote}
> If the developer is not careful, the old leader will continue to act as a
> leader and may take actions that conflict with those of the new leader. For
> this reason, when a process receives a Disconnected event, the process should
> suspend actions taken as a leader until it reconnects. Normally this
> reconnect happens very quickly.
> {quote}
> So it's equally necessary to handle Disconnected event and Expired event.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)