[ https://issues.apache.org/jira/browse/KAFKA-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871428#comment-17871428 ]
Trevan Richins commented on KAFKA-17229: ---------------------------------------- {quote}Why do you let the punctuator sleep? Sounds like an anti-pattern? {quote} It is for this small example. The real punctuators that I have don't sleep. Instead, they are looping through the entire state store to find entries that have expired. {quote}Thus, it seems not to be an issue with Kafka Streams, but rather that the punctuators are running for too long, and don't give control back to the KS runtime quickly enough. {quote} An individual punctuator is not running too long. The problem occurs when there are multiple punctuators that combined exceed the timeout. If Kafka Streams would check the status of the transaction after each punctuator, this issue would not happen. Why does Kafka Streams only check after all the punctuators run? > Multiple punctuators that together exceed the transaction timeout cause > ProducerFencedException > ----------------------------------------------------------------------------------------------- > > Key: KAFKA-17229 > URL: https://issues.apache.org/jira/browse/KAFKA-17229 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 3.8.0 > Reporter: Trevan Richins > Priority: Major > Attachments: always-forward-failure.log, topic-input-failure.log > > > If a single StreamThread has multiple punctuators tasks and the sum total of > them exceeds the transaction timeout setting, ProducerFencedExceptions will > occur. > For example, in my test case, I have a input topic with 10 partitions, a > processor with a punctuator that just sleeps for 5 seconds (the transaction > timeout is 10s so it finishes within the timeout), and an output topic. The > punctuators run every 30 seconds (wall clock). Once the app is running and > is inside one of the punctuators, I put one record in the input topic. The > punctuators will all finish and the record will be seen and read but it won't > commit because the punctuators run again (since it has been 30s since they > last started). After the punctuators finish this second time, it will try to > commit the transaction that it started 50 seconds ago and will trigger the > ProducerFencedException. > Another test case, with the same scenario, is having the punctuators forward > something. This also causes a ProducerFencedException because the first > punctuator starts a transaction but it doesn't commit the transaction till > all of the punctuators are done and that is long after the transaction > timeout. > The issue doesn't exist if there is only one partition as the single > punctuator will finish within the transaction timeout. It is only when there > are multiple punctuators that exceed the transaction timeout in total. > It feels like what is needed is for kafka to check after each punctuator if > there is data that needs to be committed. If there is, it commits then. > > I've attached a log of the first test case. It is called > "topic-input-failure.log". It starts after the punctuators run the first > time. It shows the record being received and the transaction starting. Then > it runs the punctuators again and they each sleep for 5 seconds. Once they > are done, it triggers a ProducerFencedException. > I've attached a log for the second test case. It is called > "always-forward-failure.log". It starts when the punctuators run the first > time. It shows the punctuators forwarding a record and sleeping for 5 > seconds. In this case, only 5 punctuators run as a group. An > InvalidProducerEpochException occurs after the 5th punctuator finishes. -- This message was sent by Atlassian Jira (v8.20.10#820010)