[ 
https://issues.apache.org/jira/browse/KAFKA-20563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schofield updated KAFKA-20563:
-------------------------------------
    Description: 
[https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D]

Preliminary analysis:

 

 

  1. Test calls alterPartitionReassignments to move partitions between 
brokers/racks.
  2. During the transition, a share group heartbeat fires. Share groups have an 
extra trigger — initializedAssignmentPending()
  (GroupMetadataManager.java) — that forces assignment recomputation on every 
heartbeat when there are unassigned initialized partitions. Combined
  with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every heartbeat 
triggers the assignor.
  3. The RackAwareAssignor runs against transitional metadata where a 
partition's rack set doesn't match any member. It throws 
PartitionAssignorException.

 [2026-05-10 21:15:09,514] ERROR [GroupCoordinator id=0] Operation 
share-group-heartbeat with ShareGroupHeartbeatRequestData(groupId='group0', 
memberId='mMvKOe5MR0aBoDlFTKTTnA', memberEpoch=10, rackId=null, 
subscribedTopicNames=null) hit an unexpected exception: 
org.apache.kafka.common.errors.UnknownServerException: Failed to compute a new 
target assignment for epoch 11: No member found for racks [rack2] for partition 
0 of topic TDeVaIP_Q2OWvedEfXl_ng. 
(org.apache.kafka.coordinator.group.GroupCoordinatorService:54)        
java.util.concurrent.CompletionException: 
org.apache.kafka.common.errors.UnknownServerException: Failed to compute a new 
target assignment for epoch 11: No member found for racks [rack2] for partition 
0 of topic TDeVaIP_Q2OWvedEfXl_ng


  4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as 
UnknownServerException.
  5. AbstractHeartbeatRequestManager treats this as a fatal error, 
transitioning the consumer member to FATAL state permanently.
  6. On test teardown, ShareConsumerImpl.close() tries to leave the group, 
encounters the UnknownServerException in the background event queue, and throws 
KafkaException("Failed to close Kafka share consumer").

  was:
[https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D]

Preliminary analysis:

 

 

  1. Test calls alterPartitionReassignments to move partitions between 
brokers/racks.
  2. During the transition, a share group heartbeat fires. Share groups have an 
extra trigger — initializedAssignmentPending()
  (GroupMetadataManager.java) — that forces assignment recomputation on every 
heartbeat when there are unassigned initialized partitions. Combined
  with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every heartbeat 
triggers the assignor.
  3. The RackAwareAssignor runs against transitional metadata where a 
partition's rack set doesn't match any member. It throws 
PartitionAssignorException.
  4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as 
UnknownServerException.
  5. AbstractHeartbeatRequestManager treats this as a fatal error, 
transitioning the consumer member to FATAL state permanently.
  6. On test teardown, ShareConsumerImpl.close() tries to leave the group, 
encounters the UnknownServerException in the background event queue, and throws 
KafkaException("Failed to close Kafka share consumer").


> Flaky test ShareConsumerRackAwareTest.testShareConsumerWithRackAwareAssignor
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-20563
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20563
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 4.3.0
>            Reporter: Sushant Mahajan
>            Assignee: Andrew Schofield
>            Priority: Minor
>
> [https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D]
> Preliminary analysis:
>  
>  
>   1. Test calls alterPartitionReassignments to move partitions between 
> brokers/racks.
>   2. During the transition, a share group heartbeat fires. Share groups have 
> an extra trigger — initializedAssignmentPending()
>   (GroupMetadataManager.java) — that forces assignment recomputation on every 
> heartbeat when there are unassigned initialized partitions. Combined
>   with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every 
> heartbeat triggers the assignor.
>   3. The RackAwareAssignor runs against transitional metadata where a 
> partition's rack set doesn't match any member. It throws 
> PartitionAssignorException.
>  [2026-05-10 21:15:09,514] ERROR [GroupCoordinator id=0] Operation 
> share-group-heartbeat with ShareGroupHeartbeatRequestData(groupId='group0', 
> memberId='mMvKOe5MR0aBoDlFTKTTnA', memberEpoch=10, rackId=null, 
> subscribedTopicNames=null) hit an unexpected exception: 
> org.apache.kafka.common.errors.UnknownServerException: Failed to compute a 
> new target assignment for epoch 11: No member found for racks [rack2] for 
> partition 0 of topic TDeVaIP_Q2OWvedEfXl_ng. 
> (org.apache.kafka.coordinator.group.GroupCoordinatorService:54)      
> java.util.concurrent.CompletionException: 
> org.apache.kafka.common.errors.UnknownServerException: Failed to compute a 
> new target assignment for epoch 11: No member found for racks [rack2] for 
> partition 0 of topic TDeVaIP_Q2OWvedEfXl_ng
>   4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as 
> UnknownServerException.
>   5. AbstractHeartbeatRequestManager treats this as a fatal error, 
> transitioning the consumer member to FATAL state permanently.
>   6. On test teardown, ShareConsumerImpl.close() tries to leave the group, 
> encounters the UnknownServerException in the background event queue, and 
> throws KafkaException("Failed to close Kafka share consumer").



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to