showuon commented on PR #20859:
URL: https://github.com/apache/kafka/pull/20859#issuecomment-3584117480

   > Voter set is (A,B) with leader A. Node C is a caught up observer and tries 
to add itself. The new voter set (A,B,C) is committed by A and B, but C is 
partitioned and never learns its "auto-join" succeeded. The user removes C, and 
the voter set is back to (A,B). When the partition goes away, if updateVoterSet 
timer is expired, node C's shouldSendAddOrRemoveVoter evaluates to true and it 
will try to auto-join again.
   > I think in AddVoterHandler we check that the node being added is "caught 
up," but that check currently returns true if the node has fetched within the 
last hour (we can make this more strict obviously).
   
   Nice find! And I agree we can make the "caught up" criteria more strict to 
make sure the new added voter already knows the voters change, and then adopt 
the voter sets track in partitionState.updateState() solution below, we can 
resolve this issue because the state will move to `HAS_JOINED` when finding the 
voter contains itself. But we can handle the issue separately since it's not 
directly related to this issue. Opened 
[KAFKA-19933](https://issues.apache.org/jira/browse/KAFKA-19933).
   
   > Voter set is (A,B) with leader A. Node C is a caught up observer and tries 
to add itself. The new voter set (A,B,C) is committed by A and B, but C is 
partitioned and never learns its "auto-join" succeeded. The user removes C, and 
the voter set is back to (A,B). This is similar to the case above, but when the 
partition goes away, suppose node C's shouldSendAddOrRemoveVoter evaluates to 
false, and node C fetches instead of trying to auto-join. If this fetch 
contains both voter set changes in the same response (the latest voter set is 
(A,B)), the current code will still believe node C is in the HAS_NOT_JOINED 
state, and will attempt to auto-join again because we are only checking 
partitionState.lastVoterSet().
   > I still think we can handle this properly in the code, and it means we 
need transition HAS_NOT_JOINED -> HAS_JOINED if any VotersRecord from the 
FETCH_RESPONSE contains the local ReplicaKey. We go through the new voter sets 
in partitionState.updateState(), and remember we do not need the HWM here 
because to get to the HAS_NOT_JOINED state we had to be "caught up" at that 
point in time.
   
   Another good find! (How could you come up these edge cases :) ) Yes, I think 
tracking the voters set change in `partitionState.updateState()` should work. 
   
   
   @TaiJuWu , So in short, we have to do:
   1. When a node is the leader and the state is `UNKNOWN`, we make it to 
`HAS_JOINED`
   2. When a node is the follower (both voter and observer) and state is 
`UNKNOWN`, wait until the HWM updated, then check if the state is `HAS_JOINED` 
or `HAS_NOT_JOINED`.
   3. Implement the tracking the voters set change in 
`partitionState.updateState()` solution
   4. If in (3), we found the state is `HAS_NOT_JOINED` and local ReplicaKey 
appear in the voters set, we should change the state to `HAS_JOINED`.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to