kirktrue commented on code in PR #17022:
URL: https://github.com/apache/kafka/pull/17022#discussion_r2133073450


##########
clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java:
##########
@@ -779,14 +779,25 @@ public synchronized void 
maybeTransitionToErrorState(RuntimeException exception)
     }
 
     synchronized void handleFailedBatch(ProducerBatch batch, RuntimeException 
exception, boolean adjustSequenceNumbers) {
-        maybeTransitionToErrorState(exception);
+        // Compare the batch with the current ProducerIdAndEpoch. If the 
producer IDs are the *same* but the epochs
+        // are *different*, consider the batch as stale.
+        boolean isStaleBatch = batch.producerId() == 
producerIdAndEpoch.producerId && batch.producerEpoch() != 
producerIdAndEpoch.epoch;

Review Comment:
   Here are the places I found in which a `ProducerIdAndEpoch` is compared:
   
   * 
[`TransactionManager.setProducerIdAndEpoch()`](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L604-L615)
 checks the `producerId`, but it appears to only affect logging, though.
   * 
[`TransactionManager.maybeUpdateProducerIdAndEpoch()`](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/TransactionManager.java#L589-L602)
 calls to `hasStaleProducerIdAndEpoch()` to compare its current 
`ProducerIdAndEpoch` with the one in its `txnPartitionMap`. In the case we're 
seeing, the producer ID in the `ProducerBatch` is out of sync. I don't know if 
the `txnPartitionMap` is also out of sync in that case.
   * 
[`ProducerBatch.resetProducerState()`](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/ProducerBatch.java#L476-L481)
 seems interesting to consider in that maybe it could be called out of sync 
with the transaction manager? That method is called by 
[`TxnPartitionEntry.adjustSequencesDueToFailedBatch()`](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/TxnPartitionEntry.java#L139-L152),
 but resets the batch with the same `ProducerIdAndEpoch` from the batch. Should 
it be consulting the `TransactionManager` for the _current_ value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to