Sirius created ZOOKEEPER-4685: --------------------------------- Summary: Unnecessary system unavailability due to Leader shutdown when follower sent ACK of PROPOSAL before sending ACK of NEWLEADER in log recovery Key: ZOOKEEPER-4685 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4685 Project: ZooKeeper Issue Type: Bug Components: quorum, server Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3 Reporter: Sirius
When a follower is processing the NEWLEADER message in SYNC phase, it will call {{logRequest(..)}} to submit the txn persistence task to the SyncRequestProcessor thread. The SyncRequestProcessor thread may persist txns and reply ACK of that txn before replying ACK-LD (i.e. ACK of NEWLEADER) to the leader. This may cause the consequence that the leader cannot collect enough number of ACK-LDs successfully, followed by the leader's shutdown and a new round of election. This introduces unnecessary recovery procedures, consumes extra time before servers get into the BROADCAST phase and reduces the service's availability a lot. The following trace can be generated in the latest version nowadays. h2. Trace Start the ensemble with three nodes: S{+}0{+}, +S1+ & {+}S2{+}. - +S2+ is elected leader. - +S2+ logs a new txn <1, 1> and makes a broadcast. - +S0+ restarts & +S1+ crashes before receiving the proposal of <1, 1>. - +S2+ is elected leader again. - +S2+ syncs with +S0+ using DIFF, and sends the proposal of <1, 1> during SYNC. - After +S0+ receives NEWLEADER, {+}S0{+}'s sync thread may persist the txn <1, 1> and reply corresponding ACK to the leader +S2+ before {+}S0{+}'s QuorumPeer thread replies ACK-LD to the leader +S2+ .(This is possible because txn logging is processed asynchronously by Sync thread! ) - The corresponding learnerHandler on +S2+ cannot recognize the ACK of some proposal before ACK-LD, and is going to be blocked at _waitForStartup()_ until the leader turn its state to {_}state.RUNNING{_}. - However, the quorumPeer thread of the leader +S2+ cannot receive enough number of ACK-LD, and then throws _InterruptedException_ during {_}waitForNewLeaderAck(..){_}. - After that, the leader will shutdown and a new round of election is raised, which consumes extra time for establishing the quorum and reduces availability a lot. -- This message was sent by Atlassian Jira (v8.20.10#820010)