jsancio commented on code in PR #19747: URL: https://github.com/apache/kafka/pull/19747#discussion_r2098319932
########## raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java: ########## @@ -1020,24 +1020,14 @@ private void maybeHandleElectionLoss(NomineeState state, long currentTimeMs) { if (state instanceof CandidateState candidate) { if (candidate.epochElection().isVoteRejected() && !candidate.isBackingOff()) { logger.info( - "Insufficient remaining votes to become leader. We will backoff before retrying election again. " + - "Current epoch election state is {}.", + "Insufficient remaining votes to become leader. Candidate will wait the remaining election " + + "timeout ({}) before transitioning back to Prospective. Current epoch election state is {}.", + candidate.remainingElectionTimeMs(currentTimeMs), candidate.epochElection() ); - // Go immediately to a random, exponential backoff. The backoff starts low to prevent - // needing to wait the entire election timeout when the vote result has already been - // determined. The randomness prevents the next election from being gridlocked with - // another nominee due to timing. The exponential aspect limits epoch churn when the - // replica has failed multiple elections in succession. - candidate.startBackingOff( - currentTimeMs, - RaftUtil.binaryExponentialElectionBackoffMs( - quorumConfig.electionBackoffMaxMs(), - RETRY_BACKOFF_BASE_MS, - candidate.retries(), - random - ) - ); + // Mark that the candidate is now backing off. After election timeout expires the candidate will + // transition back to prospective + candidate.startBackingOff(); Review Comment: It looks like we recored the `isBackingOff` state simply so we can log this message at most once per epoch. Maybe we can remove this state if we change `EpochElection#recordVote` so that it returns true if the vote resulted in the election getting granted or rejected. The other option is to log this message multiple times. Once for each vote response after the election was lost. I am okay with this since election should be infrequent and election are throttle by the election timeout. ########## raft/src/main/java/org/apache/kafka/raft/QuorumState.java: ########## @@ -505,7 +505,6 @@ public void prospectiveAddVotedState( ); } - ProspectiveState prospectiveState = prospectiveStateOrThrow(); // Note that we reset the election timeout after voting for a candidate because we // know that the candidate has at least as good of a chance of getting elected as us Review Comment: Outside the scope of this PR but this is not true when there is a network partition. We should not reset the election timer if a vote is granted. The worst case, if the replica gets this wrong, is that the replica transitions back to the follower or unattached state. ########## raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java: ########## @@ -3149,17 +3139,12 @@ private long pollCandidate(long currentTimeMs) { // 3) the shutdown timer expires long minRequestBackoffMs = maybeSendVoteRequests(state, currentTimeMs); return Math.min(shutdown.remainingTimeMs(), minRequestBackoffMs); - } else if (state.isBackingOff()) { - if (state.isBackoffComplete(currentTimeMs)) { - logger.info("Transition to prospective after election backoff has completed"); - transitionToProspective(currentTimeMs); - return 0L; - } - return state.remainingBackoffMs(currentTimeMs); } else if (state.hasElectionTimeoutExpired(currentTimeMs)) { logger.info("Election was not granted, transitioning to prospective"); transitionToProspective(currentTimeMs); return 0L; + } else if (state.isBackingOff()) { + return state.remainingElectionTimeMs(currentTimeMs); Review Comment: Why do you need this? KRaft already takes the min of the request timeout and the election timeout in line 3150. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org