[ 
https://issues.apache.org/jira/browse/KAFKA-9572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041328#comment-17041328
 ] 

Guozhang Wang commented on KAFKA-9572:
--------------------------------------

It seems that when we injected the error the local state would be wiped since 
we are closing dirty, and then the tasks got migrated again while the 
restoration has not completed yet -- in this case we should just update the 
checkpoint file without committing at all. However in trunk right now this was 
not done correctly, I will try to piggy-back this fix along with my ongoing PR 
for handling exceptions: https://github.com/apache/kafka/pull/8058

> Sum Computation with Exactly-Once Enabled and Injected Failures Misses Some 
> Records
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-9572
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9572
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.4.0
>            Reporter: Bruno Cadonna
>            Assignee: Guozhang Wang
>            Priority: Blocker
>             Fix For: 2.5.0
>
>         Attachments: 7-changelog-1.txt, data-1.txt, streams22.log, 
> streams23.log, streams30.log, sum-1.txt
>
>
> System test {{StreamsEosTest.test_failure_and_recovery}} failed due to a 
> wrongly computed aggregation under exactly-once (EOS). The specific error is:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: Result verification 
> failed for ConsumerRecord(topic = sum, partition = 1, leaderEpoch = 0, offset 
> = 2805, CreateTime = 1580719595164, serialized key size = 4, serialized value 
> size = 8, headers = RecordHeaders(headers = [], isReadOnly = false), key = 
> [B@6c779568, value = [B@f381794) expected <6069,17269> but was <6069,10698>
>       at 
> org.apache.kafka.streams.tests.EosTestDriver.verifySum(EosTestDriver.java:444)
>       at 
> org.apache.kafka.streams.tests.EosTestDriver.verify(EosTestDriver.java:196)
>       at 
> org.apache.kafka.streams.tests.StreamsEosTest.main(StreamsEosTest.java:69)
> {code} 
> That means, the sum computed by the Streams app seems to be wrong for key 
> 6069. I checked the dumps of the log segments of the input topic partition 
> (attached: data-1.txt) and indeed two input records are not considered in the 
> sum. With those two missed records the sum would be correct. More concretely, 
> the input values for key 6069 are:
> # 147
> # 9250
> # 5340 
> # 1231
> # 1301
> The sum of this values is 17269 as stated in the exception above as expected 
> sum. If you subtract values 3 and 4, i.e., 5340 and 1231 from 17269, you get 
> 10698 , which is the actual sum in the exception above. Somehow those two 
> values are missing.
> In the log dump of the output topic partition (attached: sum-1.txt), the sum 
> is correct until the 4th value 1231 , i.e. 15968, then it is overwritten with 
> 10698.
> In the log dump of the changelog topic of the state store that stores the sum 
> (attached: 7-changelog-1.txt), the sum is also overwritten as in the output 
> topic.
> I attached the logs of the three Streams instances involved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to