[
https://issues.apache.org/jira/browse/KAFKA-20563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sushant Mahajan updated KAFKA-20563:
------------------------------------
Description:
https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D
Preliminary analysis:
1. Test calls alterPartitionReassignments to move partitions between
brokers/racks.
2. During the transition, a share group heartbeat fires. Share groups have an
extra trigger — initializedAssignmentPending()
(GroupMetadataManager.java) — that forces assignment recomputation on every
heartbeat when there are unassigned initialized partitions. Combined
with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every heartbeat
triggers the assignor.
3. The RackAwareAssignor runs against transitional metadata where a
partition's rack set doesn't match any member. It throws
PartitionAssignorException.
4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as
UnknownServerException.
5. AbstractHeartbeatRequestManager treats this as a fatal error,
transitioning the consumer member to FATAL state permanently.
6. On test teardown, ShareConsumerImpl.close() tries to leave the group,
encounters the UnknownServerException in the background event queue (not
filtered
by the KAFKA-19229 fix), and throws KafkaException("Failed to close Kafka
share consumer").
> Flaky test ShareConsumerRackAwareTest.testShareConsumerWithRackAwareAssignor
> ----------------------------------------------------------------------------
>
> Key: KAFKA-20563
> URL: https://issues.apache.org/jira/browse/KAFKA-20563
> Project: Kafka
> Issue Type: Bug
> Reporter: Sushant Mahajan
> Assignee: Sushant Mahajan
> Priority: Minor
>
> https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D
> Preliminary analysis:
>
>
> 1. Test calls alterPartitionReassignments to move partitions between
> brokers/racks.
> 2. During the transition, a share group heartbeat fires. Share groups have
> an extra trigger — initializedAssignmentPending()
> (GroupMetadataManager.java) — that forces assignment recomputation on every
> heartbeat when there are unassigned initialized partitions. Combined
> with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every
> heartbeat triggers the assignor.
> 3. The RackAwareAssignor runs against transitional metadata where a
> partition's rack set doesn't match any member. It throws
> PartitionAssignorException.
> 4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as
> UnknownServerException.
> 5. AbstractHeartbeatRequestManager treats this as a fatal error,
> transitioning the consumer member to FATAL state permanently.
> 6. On test teardown, ShareConsumerImpl.close() tries to leave the group,
> encounters the UnknownServerException in the background event queue (not
> filtered
> by the KAFKA-19229 fix), and throws KafkaException("Failed to close Kafka
> share consumer").
--
This message was sent by Atlassian Jira
(v8.20.10#820010)