cmccabe commented on PR #16900: URL: https://github.com/apache/kafka/pull/16900#issuecomment-2299868510
> I don't want raft to forget or expose forgotten state. At one point raft knew that the leader was Y at epoch X. Later for the same epoch raft would report that the leader is unknown for epoch X. I don't want for raft to expose that it forgot that the leader was Y at epoch X. You can thinking of this as lost of data. Raft needs to stay at epoch X because it is resigning. The expectation is that some other voter will increase the epoch to X + 1 and try to win the election. If you read the JavaDoc of `RaftClient.Listener.handleLeaderChange` it states that: > the implementation of method should expect this method will be called at most twice for each epoch. Once if the epoch changed but the leader is not known and once when the leader is known for the current epoch. To me this implies that the best thing to do would be to immediately fire off a callback with epoch = X + 1, leader = empty to let everyone know that we lost leadership. Then, later on once we figure out who the leader is, fire off another callback with epoch = X + 1, leader = new-leader. Now, the JavaDoc also states that "if this node is the leader, then the notification of leadership will be delayed" (why?) From a correctness point of view, we can certainly wait before delivering this callback, but I think it's suboptimal. You are keeping everyone outside the raft layer "in the dark" while you wait for the new leader to settle, which will result in them doing wasted work. For example, if you're doing ZK migration, the controller may keep trying to send out UpdateLeaderAndIsrRequest even after the rest of the cluster has moved on. The controller epoch should protect us, but... it feels suboptimal. Anyway, let's get this bugfix in and discuss more next week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org