Xin Gao created FLINK-39981:
-------------------------------

             Summary: Autoscaler cannot trigger source rebalance after active 
split removal on dynamic Kafka source
                 Key: FLINK-39981
                 URL: https://issues.apache.org/jira/browse/FLINK-39981
             Project: Flink
          Issue Type: Improvement
          Components: Autoscaler, Kubernetes Operator
            Reporter: Xin Gao


The autoscaler derives Kafka source partition count by enumerating metric names 
matching:

KafkaSourceReader.topic.<topic>.partition.<id>.currentOffset

After DynamicKafkaSource removes Kafka clusters/topics/splits due to metadata 
changes, removed split metric names may remain visible through the aggregated 
metrics endpoint. The autoscaler therefore continues counting removed 
partitions as active partitions.

*Example:*
 - Source parallelism: 6
 - Active Kafka partitions before metadata update: 8
 - Metadata update removes a Kafka cluster containing 5 partitions
 - Active Kafka partitions after update: 3
 - Removed partition currentOffset metric names remain visible
 - Autoscaler still calculates numSourcePartitions = 8
 - Source parallelism remains 6

*Expected behavior:*

The autoscaler should calculate numSourcePartitions from currently {*}active 
Kafka splits only{*}. After the metadata update, numSourcePartitions should 
become 3 and source parallelism should be capped to <= 3 on the next autoscaler 
evaluation.

*Actual behavior:*

The autoscaler continues counting stale currentOffset metric names and does not 
scale down. The source remains {*}over-parallelized with empty subtasks{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to