[ https://issues.apache.org/jira/browse/ZOOKEEPER-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646820#comment-13646820 ]
Patrick Hunt commented on ZOOKEEPER-1697: ----------------------------------------- Hi [~fpj] thanks for looking. bq. I looked at trunk because this issue is also marked for 3.5.0. Sorry for the confusion. I listed the "Affects Version" as 3.4.3, because that's where I saw it. Marked as fix for both 3.4 and 3.5 because I wanted to make sure we considered it for both. I haven't looked to see if it effects 3.5 though. To be clear I've been looking at 3.4 codebase. In particular 3.4.3 release codebase which is where the issue was seen. bq. I had a look at trunk, and I don't understand how LearnerHandler#synced() can be the culprit here. Calls to synced() appear in two places in Leader: Is this still a question for you if you look at the 3.4.3 codebase? What I see in the logs is that the leader has shut down with the log I mentioned: {noformat} org.apache.zookeeper.server.quorum.Leader: Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2 {noformat} All four of the followers were in the middle of snapshotting during this time. Once they complete that task they then attempt to {noformat} writePacket(ack, true); {noformat} I've verified from all four follower logs that this fails for all four in this location - acking to the leader and getting "broken pipe" due to the leader having already shutdown. Based on my reading of the code: 1) it seems that the followers are busy snapshotting and have not yet ack'd. 2) Because LearnerHandler hasn't seen the ack it has not yet updated tickOfLastAck for the first time (so 0). We're in bootstrapping so we should allow for initLimit (which is configured large relative to syncLimit) 3) Leader lead() method is checking the tickOfLastAck against the syncLimit via the synced() call, which has expired and therefore calls shutdown. I'm suggesting that if initLimit were used instead during this call to synced (but only during the bootstrapping phase, you still want to use syncLimit otw) that the followers would eventually have been able to snapshot, send the ack, and the leader would be happy (tickOfLastAck updated appropriately). In other words we need to extend the time period of the check in synced during the bootstrapping of a learner. This make sense to you all? If not what am I missing. Thanks! > large snapshots can cause continuous quorum failure > --------------------------------------------------- > > Key: ZOOKEEPER-1697 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1697 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.3 > Reporter: Patrick Hunt > Assignee: Patrick Hunt > Priority: Critical > Fix For: 3.5.0, 3.4.6 > > > I keep seeing this on the leader: > 2013-04-30 01:18:39,754 INFO > org.apache.zookeeper.server.quorum.Leader: Shutdown called > java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2 > at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447) > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422) > at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) > The followers are downloading the snapshot when this happens, and are > trying to do their first ACK to the leader, the ack fails with broken > pipe. > In this case the snapshots are large and the config has increased the > initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note > this is 3.4.3 with ZOOKEEPER-1521 applied. > I originally speculated that > https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related. > I thought I might have broken something for this environment. That > doesn't look to be the case. > As it looks now it seems that 1521 didn't go far enough. The leader > verifies that all followers have ACK'd to the leader within the last > "syncLimit" time period. This runs all the time in the background on > the leader to identify the case where a follower drops. In this case > the followers take so long to load the snapshot that this check fails > the very first time, as a result the leader drops (not enough ack'd > followers w/in the sync limit) and re-election happens. This repeats > forever. (the above error) > this is the call: > org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at > odds. > look at setting of tickOfLastAck in > org.apache.zookeeper.server.quorum.LearnerHandler.run() > It's not set until the follower first acks - in this case I can see > that the followers are not getting to the ack prior to the leader > shutting down due to the error log above. > It seems that sync() should probably use the init limit until the > first ack comes in from the follower. I also see that while tickOfLastAck and > leader.self.tick is shared btw two threads there is no synchronization of the > shared resources. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira