[ 
https://issues.apache.org/jira/browse/KAFKA-10391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-10391.
-----------------------------------
    Fix Version/s: 2.7.0
       Resolution: Fixed

> Streams should overwrite checkpoint excluding corrupted partitions
> ------------------------------------------------------------------
>
>                 Key: KAFKA-10391
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10391
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>            Priority: Major
>             Fix For: 2.7.0
>
>
> While working on https://issues.apache.org/jira/browse/KAFKA-9450 I 
> discovered another bug in Streams: when some partitions are corrupted due to 
> offsets out of range, we treat it as task corrupted and would close them as 
> dirty and then revive. However we forget to overwrite the checkpoint file 
> excluding those out-of-range partitions to let them be re-bootstrapped from 
> the new log-start offset, and hence when the task is revived, it would still 
> load the old offset and start from there and then get the out-of-range 
> exception again. This may cause {{StreamsUpgradeTest.test_app_upgrade}} to be 
> flaky.
> We do not see this often because in the past we always delete the checkpoint 
> file after loading it and we usually only see the out-of-range exception at 
> the beginning of the restoration but not during restoration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to