[ https://issues.apache.org/jira/browse/KAFKA-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871515#comment-17871515 ]
Matthias J. Sax commented on KAFKA-17229: ----------------------------------------- Thanks for clarification. This makes sense to me now :) {quote}Why does Kafka Streams only check after all the punctuators run? {quote} Originally, when KS was started, there was no transactions and the code was just written in this way, and never modified. While it won't be a straightforward way to fix it, with some clever refactoring it should be possible to actually do this. I don't see any fundamental reason why we would not be able to change it. > Multiple punctuators that together exceed the transaction timeout cause > ProducerFencedException > ----------------------------------------------------------------------------------------------- > > Key: KAFKA-17229 > URL: https://issues.apache.org/jira/browse/KAFKA-17229 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Reporter: Trevan Richins > Priority: Major > Attachments: always-forward-failure.log, topic-input-failure.log > > > If a single StreamThread has multiple punctuators tasks and the sum total of > them exceeds the transaction timeout setting, ProducerFencedExceptions will > occur. > For example, in my test case, I have a input topic with 10 partitions, a > processor with a punctuator that just sleeps for 5 seconds (the transaction > timeout is 10s so it finishes within the timeout), and an output topic. The > punctuators run every 30 seconds (wall clock). Once the app is running and > is inside one of the punctuators, I put one record in the input topic. The > punctuators will all finish and the record will be seen and read but it won't > commit because the punctuators run again (since it has been 30s since they > last started). After the punctuators finish this second time, it will try to > commit the transaction that it started 50 seconds ago and will trigger the > ProducerFencedException. > Another test case, with the same scenario, is having the punctuators forward > something. This also causes a ProducerFencedException because the first > punctuator starts a transaction but it doesn't commit the transaction till > all of the punctuators are done and that is long after the transaction > timeout. > The issue doesn't exist if there is only one partition as the single > punctuator will finish within the transaction timeout. It is only when there > are multiple punctuators that exceed the transaction timeout in total. > It feels like what is needed is for kafka to check after each punctuator if > there is data that needs to be committed. If there is, it commits then. > > I've attached a log of the first test case. It is called > "topic-input-failure.log". It starts after the punctuators run the first > time. It shows the record being received and the transaction starting. Then > it runs the punctuators again and they each sleep for 5 seconds. Once they > are done, it triggers a ProducerFencedException. > I've attached a log for the second test case. It is called > "always-forward-failure.log". It starts when the punctuators run the first > time. It shows the punctuators forwarding a record and sleeping for 5 > seconds. In this case, only 5 punctuators run as a group. An > InvalidProducerEpochException occurs after the 5th punctuator finishes. -- This message was sent by Atlassian Jira (v8.20.10#820010)