[
https://issues.apache.org/jira/browse/KAFKA-19862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Quah updated KAFKA-19862:
------------------------------
Fix Version/s: 4.2.0
Priority: Blocker (was: Major)
> Group coordinator loading may fail when there is concurrent compaction
> ----------------------------------------------------------------------
>
> Key: KAFKA-19862
> URL: https://issues.apache.org/jira/browse/KAFKA-19862
> Project: Kafka
> Issue Type: Bug
> Components: group-coordinator
> Reporter: Sean Quah
> Assignee: Sean Quah
> Priority: Blocker
> Fix For: 4.2.0
>
>
> For consumer and streams groups, we reject replay of
> {{Consumer/StreamsGroupCurrentMemberAssignment}} records when we detect a
> partition / task is already owned by another member.
> During group coordinator load, we replay the records in
> {{{}__consumer_offsets{}}}. When compaction is running concurrently, we can
> load uncompacted data, followed by a newly swapped in compacted segment,
> followed by the uncompacted head of the log. This allows for situations where
> the record unassigning a partition/task is missed during loading.
> eg.
> We can load a record \{ Member A is assigned partition X },
> then miss the record \{ Member A is unassigned partition X },
> then load the record \{ Member B is assigned partition X }, which fails with
> an exception like
> {{[GroupCoordinator id=2] Failed to load metadata from __consumer_offsets-4
> with epoch 10 due to java.lang.RuntimeException: Replaying record
> CoordinatorRecord(key=ConsumerGroupCurrentMemberAssignmentKey(groupId='...',
> memberId='ZxHk7W53S_aHFdpxYc-_Jw'),
> value=ApiMessageAndVersion(ConsumerGroupCurrentMemberAssignmentValue(memberEpoch=854659,
> previousMemberEpoch=854633, state=0,
> assignedPartitions=[TopicPartitions(topicId=9lL1aTMuSC22QAXsHgzhew,
> partitions=[1, 2]), TopicPartitions(topicId=RHKM682KQYyOfF1XsOSF1A,
> partitions=[0]), TopicPartitions(topicId=rKx9q1JmS1uP-ug_cj56ug,
> partitions=[0]), TopicPartitions(topicId=I7EtFwesTRubnj-VHClqbQ,
> partitions=[2]), TopicPartitions(topicId=ydAln6IUTZe-od9UUkn3rg,
> partitions=[2])], partitionsPendingRevocation=[]) at version 0)) from
> __consumer_offsets-4 at offset 3889549 with producer id -1 and producer epoch
> -1 failed..}}
> {{java.lang.IllegalStateException: Cannot set the epoch of
> RHKM682KQYyOfF1XsOSF1A-0 to 854659 because the partition is still owned at
> epoch 853490}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)