FrancisGodinho opened a new pull request, #21161: URL: https://github.com/apache/kafka/pull/21161
# Problem During broker upgrades, the `sendOffsetsToTransaction` call would sometimes hang. Logs showed that it continuously returned `errorCode=51` which is `CONCURRENT_TRANSACTION`. The test would eventually hit its timeout and fail. This happened for every single version upgrade and occurred in around 30% of the runs. # Resolution The problem above left the producer in a broken state and even after 5-10 minutes of waiting, it didn't resolve itself (even if we waited a few minutes past the transaction.max.ms time). I tried multiple solutions including waiting extended periods of time and re-trying the `sendOffsetsToTransaction` multiple times whenever timeout occurred. Unfortunately, the producer was just permanently stuck and always receiving the `errorCode=51`. In this case, the recommended resolution in the Kafka docs is to close the previous producer and create a new producer. https://kafka.apache.org/documentation/#usingtransactions <img width="652" height="59" alt="image" src="https://github.com/user-attachments/assets/e95500d6-f1b6-44fa-b6a2-5c1800448d32" /> Using the old transaction.id would continue to lead to a stuck state, so this fix creates a brand new producer with a new ID and then rewinds the consumer offset to ensure EOD. # Testing and Validation Previously, I was able to run the test for a single version upgrade and have it fail within the first 5-10 runs. After the fix, I was able to run it 40 times continuously with 0 failures. I also ran the full test (all versions) ~5 times with 9/9 cases passing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
