guozhangwang edited a comment on pull request #9569:
URL: https://github.com/apache/kafka/pull/9569#issuecomment-725014616


   Okay I think I need to rephrase my thoughts above in another way: right now 
when abort txns due to timeout, the txn coordinator would 1) first bump up the 
epoch, and 2) append the prepare-abort log with the newly bumped epoch, and 
then 3) send the txn-markers to the partition leaders with the newly bumped 
epoch. As a result, the partition leaders will bump their local epoch of the 
pid as well.
   
   This makes sense for disallowing the old producer to append more records 
with the old epoch, but like KIP-588 mentioned it does not help differentiating 
the timeout scenario from the new initPID scenario where only the latter indeed 
gets a new producer. And although txn coordinators may be able to tell the 
difference based on its old cache values, the partition leader would not know 
at all and hence forcing its fenced-producer to always be translated to invalid 
producer error seems sub-optimal.
   
   IF, we do want to let the partition leader to never return fenced-producer, 
then at least we should change that in 
`ProducerStateManager#checkProducerEpoch` and not in `KafkaApis`, and changing 
`InvalidProducerEpochException` to `ApiException` **but we'd need to document 
that very clearly what users should do upon getting 
InvalidProducerEpochException**; IF, we'd like to let the partition leader be 
able to distinguish txn timeout from new initPID as well, we should let the txn 
markers to piggyback that information. **The benefits are that we do not need 
to expose a new exception and users still only get ProducerFencedException.**
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to