kevin-wu24 commented on PR #20859: URL: https://github.com/apache/kafka/pull/20859#issuecomment-3587775523
Thanks for the discussion @TaiJuWu and @showuon. > But for v4.2, I think we still need to have a temporary solution for it, which should be this PR. My imagination is after the improved KIP merged, we can revert the solutions in this PR. And in users point of view, there is no any change when using v4.2 and v4.3. But this implementation will still exist in the released 4.2 branch + binary. Again, although I've mentioned some cases, it is difficult to reason about the correctness of this client-side design. My opinion is we should go with the KIP-design of server-side idempotency for auto-join additions if we are going to implement this semantic change at all. We should not try to hastily implement this change just to land it in 4.2, especially if we intend to just revert this change and do the server-side idempotency in 4.3. Remember that bugs with this implementation will also lead to confusing/bad UX. If we want to change the semantic, we should do it just once, with what we believe to be the best design + implementation. I believe we have gotten in KIPs after KIP-freeze before right? This KIP would be self-contained within KRaft module and only affect the auto-join feature, so maybe we can make this an exception and get this in for 4.2. I hope my previous comments have shown that trying to implement this functionality client-side is not straightforward at all. Just because I have identified edge cases that the current design handles does not mean the current design handles all possible cases. I also will not have as much time to review this change like I had this week as we come up on 4.2 code freeze. I don't see the harm in this waiting until 4.3, so we can give this change in semantic the proper time for review. Let's also wait and see what @jsancio thinks about all this too. About your question @showuon: > We might need to add another clause in shouldSendAddOrRemoveVoter: && local node LEO >= leader HWM. > > I can understand and agree with this check, the network partition issue will be resolved. But alternatively, I think we can also fix the issue by only improving KAFKA-19933 without this shouldSendAddOrRemoveVoter: && local node LEO >= leader HWM change. WDYT? Maybe? I can offer the following but I still need to think about this more. I am pretty sure that we need to implement KAFKA-19933 for the current design, but I do not know what that new "is caught up" check should be at the moment. I think the check on the leader is `auto-joing node's fetch offset >= leader's previous HWM`, but what is "previous HWM" here? This is yet another reason to do just the server-side approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
