[ https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Lars Hofhansl updated HBASE-5682: --------------------------------- Attachment: 5682-all-v2.txt Found the problem. The ClusterId could be remain null permanently if HConnection.getZookeeperWatcher() was called. That would initialize HConnectionImplementation.zookeeper, and hence not reset clusterid in ensureZookeeperTrackers. TestZookeeper.testClientSessionExpired does that. Also in TestZookeeper.testClientSessionExpired the state might be CONNECTING rather than CONNECTED depending on timing. Upon inspection I also made clusterId, rootRegionTracker, masterAddressTracker, and zooKeeper volatile, because they can be modified by a different thread, but are not exclusively accessed in a synchronized block (exiting problem). New patch that fixes the problem, passes all tests. TestZookeeper seems to have good coverage. If I can think of more tests, I'll add them there. > Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 > only) > ---------------------------------------------------------------------------------- > > Key: HBASE-5682 > URL: https://issues.apache.org/jira/browse/HBASE-5682 > Project: HBase > Issue Type: Improvement > Components: client > Reporter: Lars Hofhansl > Assignee: Lars Hofhansl > Fix For: 0.94.0 > > Attachments: 5682-all-v2.txt, 5682-all.txt, 5682-v2.txt, 5682.txt > > > Just realized that without this HBASE-4805 is broken. > I.e. there's no point keeping a persistent HConnection around if it can be > rendered permanently unusable if the ZK connection is lost temporarily. > Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to > backport) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira