gianm opened a new pull request #8177: Update to Curator 4.2.0, ZooKeeper 3.4.14. URL: https://github.com/apache/incubator-druid/pull/8177 Other than generally wanting to use the latest Curator and ZK, this change is motivated by an outage I encountered last night. I was debugging a cluster last night that was acting bizarrely, and in the end it turned out that it had two overlords that both thought they were leader. Shortly before they both gained leadership, the ZK quorum was unavailable for about 20 seconds. It doesn't look like Druid itself was doing anything particularly wrong: logs indicated the overlords weren't ignoring `stopBeingLeader` calls or anything like that. For these reasons, I believe the cause of the outage was https://issues.apache.org/jira/browse/CURATOR-498. This comment indicates the bug could cause two LeaderLatch users to become leaders at once: https://issues.apache.org/jira/browse/CURATOR-498?focusedCommentId=16732419&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16732419. The bug was fixed in Curator 4.2.0.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
