[
https://issues.apache.org/jira/browse/KAFKA-9268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270944#comment-17270944
]
Matthias J. Sax commented on KAFKA-9268:
----------------------------------------
Should we close this issue? It seems fixed via KIP-360 (2.5.0 release) and it's
unlikely that there will be a bug-fix release for older versions.
> Follow-on: Streams Threads may die from recoverable errors with EOS enabled
> ---------------------------------------------------------------------------
>
> Key: KAFKA-9268
> URL: https://issues.apache.org/jira/browse/KAFKA-9268
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.2.0
> Reporter: John Roesler
> Priority: Major
> Attachments: 2.2-eos-failures-1.txt, 2.2-eos-failures-2.txt
>
>
> While testing Streams in EOS mode under frequent and heavy network
> partitions, I've encountered exceptions leading to thread death in both 2.2
> and 2.3 (although different exceptions).
> I believe this problem is addressed in 2.4+ by
> https://issues.apache.org/jira/browse/KAFKA-9231 , however, if you look at
> the ticket and corresponding PR, you will see that the solution there
> introduced some tech debt around UnknownProducerId that needs to be cleaned
> up. Therefore, I'm not backporting that fix to older branches. Rather, I'm
> opening a new ticket to make more conservative changes in older branches to
> improve resilience, if desired.
> These failures are relative rare, so I don't think that a system or
> integration test could reliably reproduce it. The steps to reproduce would be:
> 1. set up a long-running Streams application with EOS enabled (I used three
> Streams instances)
> 2. inject periodic network partitions (I had each Streams instance schedule
> an interruption at a random time between 0 and 3 hours, then schedule the
> interruption to last a random duration between 0 and 5 minutes. The
> interruptions are accomplished by using iptables to drop all traffic to/from
> all three brokers)
> As far as the actual errors I've observed, I'm attaching the logs of two
> incidents in which a thread was caused to shut down.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)