[ 
https://issues.apache.org/jira/browse/KAFKA-20563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushant Mahajan updated KAFKA-20563:
------------------------------------
    Description: 
[https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D]

Preliminary analysis:

 

 

  1. Test calls alterPartitionReassignments to move partitions between 
brokers/racks.
  2. During the transition, a share group heartbeat fires. Share groups have an 
extra trigger — initializedAssignmentPending()
  (GroupMetadataManager.java) — that forces assignment recomputation on every 
heartbeat when there are unassigned initialized partitions. Combined
  with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every heartbeat 
triggers the assignor.
  3. The RackAwareAssignor runs against transitional metadata where a 
partition's rack set doesn't match any member. It throws 
PartitionAssignorException.
  4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as 
UnknownServerException.
  5. AbstractHeartbeatRequestManager treats this as a fatal error, 
transitioning the consumer member to FATAL state permanently.
  6. On test teardown, ShareConsumerImpl.close() tries to leave the group, 
encounters the UnknownServerException in the background event queue, and throws 
KafkaException("Failed to close Kafka share consumer").

  was:
https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D

Preliminary analysis:

 

 

  1. Test calls alterPartitionReassignments to move partitions between 
brokers/racks.
  2. During the transition, a share group heartbeat fires. Share groups have an 
extra trigger — initializedAssignmentPending()
  (GroupMetadataManager.java) — that forces assignment recomputation on every 
heartbeat when there are unassigned initialized partitions. Combined
  with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every heartbeat 
triggers the assignor.
  3. The RackAwareAssignor runs against transitional metadata where a 
partition's rack set doesn't match any member. It throws 
PartitionAssignorException.
  4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as 
UnknownServerException.
  5. AbstractHeartbeatRequestManager treats this as a fatal error, 
transitioning the consumer member to FATAL state permanently.
  6. On test teardown, ShareConsumerImpl.close() tries to leave the group, 
encounters the UnknownServerException in the background event queue (not 
filtered
  by the KAFKA-19229 fix), and throws KafkaException("Failed to close Kafka 
share consumer").


> Flaky test ShareConsumerRackAwareTest.testShareConsumerWithRackAwareAssignor
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-20563
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20563
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Sushant Mahajan
>            Assignee: Sushant Mahajan
>            Priority: Minor
>
> [https://develocity.apache.org/scans/tests?search.buildToolType=gradle&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.timeZoneId=Asia%2FCalcutta&tests.container=org.apache.kafka.clients.consumer.ShareConsumerRackAwareTest&tests.test=testShareConsumerWithRackAwareAssignor(ClusterInstance)%5B1%5D]
> Preliminary analysis:
>  
>  
>   1. Test calls alterPartitionReassignments to move partitions between 
> brokers/racks.
>   2. During the transition, a share group heartbeat fires. Share groups have 
> an extra trigger — initializedAssignmentPending()
>   (GroupMetadataManager.java) — that forces assignment recomputation on every 
> heartbeat when there are unassigned initialized partitions. Combined
>   with SHARE_GROUP_ASSIGNMENT_INTERVAL_MS_CONFIG=0, this means every 
> heartbeat triggers the assignor.
>   3. The RackAwareAssignor runs against transitional metadata where a 
> partition's rack set doesn't match any member. It throws 
> PartitionAssignorException.
>   4. GroupMetadataManager.maybeUpdateTargetAssignment wraps it as 
> UnknownServerException.
>   5. AbstractHeartbeatRequestManager treats this as a fatal error, 
> transitioning the consumer member to FATAL state permanently.
>   6. On test teardown, ShareConsumerImpl.close() tries to leave the group, 
> encounters the UnknownServerException in the background event queue, and 
> throws KafkaException("Failed to close Kafka share consumer").



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to