[
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044886#comment-17044886
]
ASF GitHub Bot commented on KAFKA-9607:
---------------------------------------
abbccdda commented on pull request #8168: KAFKA-9607: Partition group should
not be cleared if task will be revived
URL: https://github.com/apache/kafka/pull/8168
This PR fixes the illegal state bug where a task gets revived but has no
input partition assigned anymore.
### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Should not clear partition group if the task will be revived again
> ------------------------------------------------------------------
>
> Key: KAFKA-9607
> URL: https://issues.apache.org/jira/browse/KAFKA-9607
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Reporter: Boyang Chen
> Assignee: Boyang Chen
> Priority: Major
>
> We detected an issue with a corrupted task failed to revive:
> {code:java}
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,137] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle
> new assignment with:
> New active tasks: [0_0, 3_1]
> New standby tasks: []
> Existing active tasks: [0_0]
> Existing standby tasks: []
> (org.apache.kafka.streams.processor.internals.TaskManager)
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,138] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> [Consumer
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
> groupId=stream-soak-test] Adding newly assigned partitions:
> k8sName-id-repartition-1
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,138] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State
> transition from RUNNING to PARTITIONS_ASSIGNED
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,419] WARN
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException
> fetching records from restore consumer for partitions
> [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-0000000049-changelog-1], it
> is likely that the consumer's position has fallen out of the topic partition
> offset range because the topic was truncated or compacted on the broker,
> marking the corresponding tasks as corrupted and re-initializingit later.
> (org.apache.kafka.streams.processor.internals.StoreChangelogReader)
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,139] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> [Consumer
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
> groupId=stream-soak-test] Setting offset for partition
> k8sName-id-repartition-1 to the committed offset
> FetchPosition{offset=3592242, offsetEpoch=Optional.empty,
> currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
> (id: 1003 rack: null)], epoch=absent}}
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:39-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,463] ERROR
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> Encountered the following exception during processing and the thread is going
> to shut down: (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog)
> java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
> at
> org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
> at
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725)
> {code}
> The root cause is that we accidentally cleanup the partition group map so
> that next time we reboot the task it would miss input partitions.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)