showuon commented on PR #20859:
URL: https://github.com/apache/kafka/pull/20859#issuecomment-3584352866

   > The follower can only move to HAS_JOINED via processing a FETCH_RESPONSE, 
which is too late (i.e. the local node has already sent an "incorrect" 
auto-join).
   > IMO this is the harder case to handle, and handling this on the leader 
side via KAFKA-19933 is kind of "wrong" too.
   
   Yes, so I think as long as KAFKA-19933 is implemented, we can make sure the 
observer cannot join the voters successfully due to 
`leaderState.isReplicaCaughtUp()` is false. And with the addVoter request sent, 
the UpdateVoterSet timer will be reset and fetch request will be sent next.
   
   > We might need to add another clause in `shouldSendAddOrRemoveVoter: && 
local node LEO >= leader HWM`.
   
   I can understand and agree with this check, the network partition issue will 
be resolved. But alternatively, I think we can also fix the issue by only 
improving `KAFKA-19933` without this `shouldSendAddOrRemoveVoter: && local node 
LEO >= leader HWM` change. WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to