On cluster startup, master/rs connect to ZK before it's fully ready causing a
ConnectionLossException
-----------------------------------------------------------------------------------------------------
Key: HBASE-2971
URL: https://issues.apache.org/jira/browse/HBASE-2971
Project: HBase
Issue Type: Bug
Components: zookeeper
Affects Versions: 0.90.0
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Fix For: 0.90.0
There is a race condition that has existed but has been glossed over to this
point (because of our "loose" zk usage).
The ZK server process can be in a state where it will accept the socket
connection from our client in master or RS but if we do anything against the
server, we get a ConnectionLossException. The ZK client handles this
automagically and reconnects properly, as long as we are not aborting when we
get this exception.
So this works on the last 0.89 and even with the master rewrite, but as we move
towards strict usage of ZK, we should wait for ZK availability before
proceeding with startup.
I already have a patch in a local branch and it's working. Will put up a patch
soon against new master.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.