[ https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044030#comment-17044030 ]
Guozhang Wang commented on KAFKA-9572: -------------------------------------- I looked into the source code of 2.4 and 2.5 and did not find any new issues other than conjectured https://issues.apache.org/jira/browse/KAFKA-8574 – the logs themselves cannot further validates whether it was the root cause, but at least it shows that some restoring tasks are closed before restoration is completed, which could possibly lead to the bug of KAFKA-8574. This bug is fixed as part of the tech debt cleanup as in KAFKA-9113. So I think I have about 60 percent confidence that this issue is no longer there in 2.6 but it would still be in 2.4 and 2.5 since the fix itself incurs a lot of the cleanup it is hard to cherry-pick to older branches. I'd suggest we close this ticket for 2.6 only and if StreamsEosTest.test_failure_and_recovery failed on trunk again we could look into this once more. WDYT [~cadonna] [~apurva] > Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some > Records > ----------------------------------------------------------------------------------- > > Key: KAFKA-9572 > URL: https://issues.apache.org/jira/browse/KAFKA-9572 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.4.0 > Reporter: Bruno Cadonna > Assignee: Guozhang Wang > Priority: Blocker > Fix For: 2.5.0 > > Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, > streams23.log, streams30.log, sum-1.txt > > > System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a > wrongly computed aggregation under exactly-once (EOS). The specific error is: > {code:java} > Exception in thread "main" java.lang.RuntimeException: Result verification > failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset > = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value > size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = > [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698> > at > org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444) > at > org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196) > at > org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69) > {code} > That means, the sum computed by the Streams app seems to be wrong for key > 6069. I checked the dumps of the log segments of the input topic partition > (attached: data-1.txt) and indeed two input records are not considered in the > sum. With those two missed records the sum would be correct. More concretely, > the input values for key 6069 are: > # 147 > # 9250 > # 5340 > # 1231 > # 1301 > The sum of this values is 17269 as stated in the exception above as expected > sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get > 10698 , which is the actual sum in the exception above. Somehow those two > values are missing. > In the log dump of the output topic partition (attached: sum-1.txt), the sum > is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with > 10698. > In the log dump of the changelog topic of the state store that stores the sum > (attached: 7-changelog-1.txt), the sum is also overwritten as in the output > topic. > I attached the logs of the three Streams instances involved. -- This message was sent by Atlassian Jira (v8.3.4#803005)