[ https://issues.apache.org/jira/browse/KAFKA-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875704#comment-17875704 ]
Matthias J. Sax commented on KAFKA-17229: ----------------------------------------- Great find. There is currently no WIP for the new processing thread, so we don't have a timeline atm. – Thus, it might make sense to fix on the old code path (hoping that the fix won't be too complex...). I am not super familiar (well, actually not at all) with the new processing thread code, and while it would fix the issue. Maybe [~cadonna] or [~lucasbru] could shed some light (for my own education). > Multiple punctuators that together exceed the transaction timeout cause > ProducerFencedException > ----------------------------------------------------------------------------------------------- > > Key: KAFKA-17229 > URL: https://issues.apache.org/jira/browse/KAFKA-17229 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Reporter: Trevan Richins > Priority: Major > Attachments: always-forward-failure.log, topic-input-failure.log > > > If a single StreamThread has multiple punctuators tasks and the sum total of > them exceeds the transaction timeout setting, ProducerFencedExceptions will > occur. > For example, in my test case, I have a input topic with 10 partitions, a > processor with a punctuator that just sleeps for 5 seconds (the transaction > timeout is 10s so it finishes within the timeout), and an output topic. The > punctuators run every 30 seconds (wall clock). Once the app is running and > is inside one of the punctuators, I put one record in the input topic. The > punctuators will all finish and the record will be seen and read but it won't > commit because the punctuators run again (since it has been 30s since they > last started). After the punctuators finish this second time, it will try to > commit the transaction that it started 50 seconds ago and will trigger the > ProducerFencedException. > Another test case, with the same scenario, is having the punctuators forward > something. This also causes a ProducerFencedException because the first > punctuator starts a transaction but it doesn't commit the transaction till > all of the punctuators are done and that is long after the transaction > timeout. > The issue doesn't exist if there is only one partition as the single > punctuator will finish within the transaction timeout. It is only when there > are multiple punctuators that exceed the transaction timeout in total. > It feels like what is needed is for kafka to check after each punctuator if > there is data that needs to be committed. If there is, it commits then. > > I've attached a log of the first test case. It is called > "topic-input-failure.log". It starts after the punctuators run the first > time. It shows the record being received and the transaction starting. Then > it runs the punctuators again and they each sleep for 5 seconds. Once they > are done, it triggers a ProducerFencedException. > I've attached a log for the second test case. It is called > "always-forward-failure.log". It starts when the punctuators run the first > time. It shows the punctuators forwarding a record and sleeping for 5 > seconds. In this case, only 5 punctuators run as a group. An > InvalidProducerEpochException occurs after the 5th punctuator finishes. -- This message was sent by Atlassian Jira (v8.20.10#820010)