[
https://issues.apache.org/jira/browse/KAFKA-9607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guozhang Wang resolved KAFKA-9607.
----------------------------------
Fix Version/s: 2.6.0
Resolution: Fixed
> Should not clear partition queue during task close
> --------------------------------------------------
>
> Key: KAFKA-9607
> URL: https://issues.apache.org/jira/browse/KAFKA-9607
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Reporter: Boyang Chen
> Assignee: Boyang Chen
> Priority: Major
> Fix For: 2.6.0
>
>
> We detected an issue with a corrupted task failed to revive:
> {code:java}
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,137] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] Handle
> new assignment with:
> New active tasks: [0_0, 3_1]
> New standby tasks: []
> Existing active tasks: [0_0]
> Existing standby tasks: []
> (org.apache.kafka.streams.processor.internals.TaskManager)
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,138] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> [Consumer
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
> groupId=stream-soak-test] Adding newly assigned partitions:
> k8sName-id-repartition-1
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,138] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1] State
> transition from RUNNING to PARTITIONS_ASSIGNED
> (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,419] WARN
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> Encountered org.apache.kafka.clients.consumer.OffsetOutOfRangeException
> fetching records from restore consumer for partitions
> [stream-soak-test-KSTREAM-AGGREGATE-STATE-STORE-0000000049-changelog-1], it
> is likely that the consumer's position has fallen out of the topic partition
> offset range because the topic was truncated or compacted on the broker,
> marking the corresponding tasks as corrupted and re-initializingit later.
> (org.apache.kafka.streams.processor.internals.StoreChangelogReader)
> [2020-02-25T08:23:38-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,139] INFO
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> [Consumer
> clientId=stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1-consumer,
> groupId=stream-soak-test] Setting offset for partition
> k8sName-id-repartition-1 to the committed offset
> FetchPosition{offset=3592242, offsetEpoch=Optional.empty,
> currentLeader=LeaderAndEpoch{leader=Optional[ip-172-31-25-115.us-west-2.compute.internal:9092
> (id: 1003 rack: null)], epoch=absent}}
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2020-02-25T08:23:39-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog) [2020-02-25
> 16:23:38,463] ERROR
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> stream-thread
> [stream-soak-test-8f2124ec-8bd0-410a-8e3d-f202a32ab774-StreamThread-1]
> Encountered the following exception during processing and the thread is going
> to shut down: (org.apache.kafka.streams.processor.internals.StreamThread)
> [2020-02-25T08:23:39-08:00]
> (streams-soak-trunk_soak_i-06305ad57801079ae_streamslog)
> java.lang.IllegalStateException: Partition k8sName-id-repartition-1 not found.
> at
> org.apache.kafka.streams.processor.internals.PartitionGroup.setPartitionTime(PartitionGroup.java:99)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.initializeTaskTime(StreamTask.java:651)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.initializeMetadata(StreamTask.java:631)
> at
> org.apache.kafka.streams.processor.internals.StreamTask.completeRestoration(StreamTask.java:209)
> at
> org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:270)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:834)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:749)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:725)
> {code}
> The root cause is that we accidentally cleanup the partition group map so
> that next time we reboot the task it would miss input partitions.
> By avoiding clean up the partition group, we may have a slight overhead for
> GC which is ok. In terms of correctness, currently there is no way to revive
> the task with partitions reassigned.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)