kevin-wu24 commented on PR #20859:
URL: https://github.com/apache/kafka/pull/20859#issuecomment-3587775523

   Thanks for the discussion @TaiJuWu and @showuon.
   
   > But for v4.2, I think we still need to have a temporary solution for it, 
which should be this PR. My imagination is after the improved KIP merged, we 
can revert the solutions in this PR. And in users point of view, there is no 
any change when using v4.2 and v4.3.
   
   But this implementation will still exist in the released 4.2 branch + 
binary. Again, although I've mentioned some cases, it is difficult to reason 
about the correctness of this client-side design. 
   
   My opinion is we should go with the KIP-design of server-side idempotency 
for auto-join additions if we are going to implement this semantic change at 
all. We should not try to hastily implement this change just to land it in 4.2, 
especially if we intend to just revert this change and do the server-side 
idempotency in 4.3. Remember that bugs with this implementation will also lead 
to confusing/bad UX. If we want to change the semantic, we should do it just 
once, with what we believe to be the best design + implementation. 
   
   I believe we have gotten in KIPs after KIP-freeze before right? This KIP 
would be self-contained within KRaft module and only affect the auto-join 
feature, so maybe we can make this an exception and get this in for 4.2.
   
   I hope my previous comments have shown that trying to implement this 
functionality client-side is not straightforward at all. Just because I have 
identified edge cases that the current design handles does not mean the current 
design handles all possible cases. I also will not have as much time to review 
this change like I had this week as we come up on 4.2 code freeze. I don't see 
the harm in this waiting until 4.3, so we can give this change in semantic the 
proper time for review. Let's also wait and see what @jsancio thinks about all 
this too.
   
   About your question @showuon:
   > We might need to add another clause in shouldSendAddOrRemoveVoter: && 
local node LEO >= leader HWM.
   > 
   > I can understand and agree with this check, the network partition issue 
will be resolved. But alternatively, I think we can also fix the issue by only 
improving KAFKA-19933 without this shouldSendAddOrRemoveVoter: && local node 
LEO >= leader HWM change. WDYT?
   
   Maybe? I can offer the following but I still need to think about this more. 
I am pretty sure that we need to implement KAFKA-19933 for the current design, 
but I do not know what that new "is caught up" check should be at the moment. I 
think the check on the leader is `auto-joing node's fetch offset >= leader's 
previous HWM`, but what is "previous HWM" here? This is yet another reason to 
do just the server-side approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to