urbandan commented on PR #12392: URL: https://github.com/apache/kafka/pull/12392#issuecomment-1261958861
@jolshan It is an idea, the first version of the PR was trying to implement that, but the current state of the PR is based on the fatal state. The idea about keeping the producer in a reusable state is kind of tricky. The issue is that to fix the bug, we need to bump the epoch instead of aborting. Normally, an epoch bump results in a successful response from the coordinator, which contains the increased epoch, which then can be safely used by the producer to keep working. But bumping an epoch **during** an ongoing transaction is handled differently, because the coordinator assumes that a producer fencing occurred (a new producer instance with the same transaction id started up). Because of this, the response to the bump does not contain an actual epoch - it kicks off the fencing operation, and tells the producer to keep retrying until the fencing operation is finished. When that is done, the coordinator will increase the epoch again, and return it to the new producer. An important observation here is that there is an epoch which **is never returned to any producers** by the coordinator. We could rely on this fact by trying to use this "hidden" epoch, by increasing the epoch on the client side. Then we can try to bump the epoch again with this "hidden" epoch. If there were no other producer instances fencing off the current producer, this will succeed, and we will get an increased epoch from the broker, meaning that the producer can safely continue. If there was another producer instance fencing of the current instance, even this "hidden" epoch will be fenced anyway. In short, as I wrote in the other thread: epoch=0 -> delivery timeout occurs -> send fencing InitPid with epoch=0 -> epoch=1 (on coordinator side) -> increase epoch on client side epoch=1 -> send another InitPid with epoch=1 -> safely acquire epoch=2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
