urbandan commented on PR #12392:
URL: https://github.com/apache/kafka/pull/12392#issuecomment-1261958861

   @jolshan It is an idea, the first version of the PR was trying to implement 
that, but the current state of the PR is based on the fatal state.
   
   The idea about keeping the producer in a reusable state is kind of tricky. 
The issue is that to fix the bug, we need to bump the epoch instead of aborting.
   Normally, an epoch bump results in a successful response from the 
coordinator, which contains the increased epoch, which then can be safely used 
by the producer to keep working. But bumping an epoch **during** an ongoing 
transaction is handled differently, because the coordinator assumes that a 
producer fencing occurred (a new producer instance with the same transaction id 
started up). Because of this, the response to the bump does not contain an 
actual epoch - it kicks off the fencing operation, and tells the producer to 
keep retrying until the fencing operation is finished. When that is done, the 
coordinator will increase the epoch again, and return it to the new producer.
   An important observation here is that there is an epoch which **is never 
returned to any producers** by the coordinator. We could rely on this fact by 
trying to use this "hidden" epoch, by increasing the epoch on the client side. 
Then we can try to bump the epoch again with this "hidden" epoch. If there were 
no other producer instances fencing off the current producer, this will 
succeed, and we will get an increased epoch from the broker, meaning that the 
producer can safely continue. If there was another producer instance fencing of 
the current instance, even this "hidden" epoch will be fenced anyway.
   
   In short, as I wrote in the other thread:
   epoch=0 -> delivery timeout occurs -> send fencing InitPid with epoch=0 -> 
epoch=1 (on coordinator side) -> increase epoch on client side epoch=1 -> send 
another InitPid with epoch=1 -> safely acquire epoch=2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to