[ 
https://issues.apache.org/jira/browse/KAFKA-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875704#comment-17875704
 ] 

Matthias J. Sax commented on KAFKA-17229:
-----------------------------------------

Great find. There is currently no WIP for the new processing thread, so we 
don't have a timeline atm. – Thus, it might make sense to fix on the old code 
path (hoping that the fix won't be too complex...).

I am not super familiar (well, actually not at all) with the new processing 
thread code, and while it would fix the issue. Maybe [~cadonna] or [~lucasbru] 
could shed some light (for my own education).

> Multiple punctuators that together exceed the transaction timeout cause 
> ProducerFencedException
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-17229
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17229
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.8.0
>            Reporter: Trevan Richins
>            Priority: Major
>         Attachments: always-forward-failure.log, topic-input-failure.log
>
>
> If a single StreamThread has multiple punctuators tasks and the sum total of 
> them exceeds the transaction timeout setting, ProducerFencedExceptions will 
> occur.
> For example, in my test case, I have a input topic with 10 partitions, a 
> processor with a punctuator that just sleeps for 5 seconds (the transaction 
> timeout is 10s so it finishes within the timeout), and an output topic.  The 
> punctuators run every 30 seconds (wall clock).  Once the app is running and 
> is inside one of the punctuators, I put one record in the input topic.  The 
> punctuators will all finish and the record will be seen and read but it won't 
> commit because the punctuators run again (since it has been 30s since they 
> last started).  After the punctuators finish this second time, it will try to 
> commit the transaction that it started 50 seconds ago and will trigger the 
> ProducerFencedException.
> Another test case, with the same scenario, is having the punctuators forward 
> something.  This also causes a ProducerFencedException because the first 
> punctuator starts a transaction but it doesn't commit the transaction till 
> all of the punctuators are done and that is long after the transaction 
> timeout.
> The issue doesn't exist if there is only one partition as the single 
> punctuator will finish within the transaction timeout.  It is only when there 
> are multiple punctuators that exceed the transaction timeout in total.
> It feels like what is needed is for kafka to check after each punctuator if 
> there is data that needs to be committed.  If there is, it commits then.
>  
> I've attached a log of the first test case.  It is called 
> "topic-input-failure.log".  It starts after the punctuators run the first 
> time.  It shows the record being received and the transaction starting.  Then 
> it runs the punctuators again and they each sleep for 5 seconds.  Once they 
> are done, it triggers a ProducerFencedException.
> I've attached a log for the second test case.  It is called 
> "always-forward-failure.log".  It starts when the punctuators run the first 
> time.  It shows the punctuators forwarding a record and sleeping for 5 
> seconds.  In this case, only 5 punctuators run as a group.  An 
> InvalidProducerEpochException occurs after the 5th punctuator finishes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to