jsancio commented on code in PR #18240:
URL: https://github.com/apache/kafka/pull/18240#discussion_r1899633981
##########
raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:
##########
@@ -2935,14 +3014,18 @@ private long pollResigned(long currentTimeMs) {
// until either the shutdown expires or an election bumps the epoch
stateTimeoutMs = shutdown.remainingTimeMs();
} else if (state.hasElectionTimeoutExpired(currentTimeMs)) {
- if (quorum.isVoter()) {
- transitionToCandidate(currentTimeMs);
- } else {
+// if (quorum.isVoter()) {
+ // canElectNewLeaderAfterOldLeaderPartitioned fails if we do
not bump epoch since it is possible
+ // that the replica ends up as follower in the same epoch.
+ // resigned(leaderId=local) -> prospective(leaderId=local) ->
follower(leaderId=local) which is illegal
+// transitionToProspective(quorum.epoch() + 1, currentTimeMs);
+// transitionToCandidate(currentTimeMs);
+// } else {
Review Comment:
> the existing raft event simulation tests picked up on a new bug in
pollResigned
What is the exact error? Let's add an unittest to one of the
`KafkaRaftClient*Test` suite that shows the bug.
> attempt to become follower of itself in epoch 5.
Let's add a check to `transtitionToFollower` that checks that `leaderId` is
not equal to `localId`.
It makes sense to me that after the resign state the replica should always
increase its epoch. The replica resigned from leadership at epoch X so
eventually the epoch will be at least X + 1. Did you consider transitioning to
candidate and relaxing the transition functions to allow both resigned and
prospective to transition to candidate?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]