[jira] [Commented] (FLINK-35594) Downscaling doesn't release TaskManagers.

Rui Fan (Jira) Thu, 13 Jun 2024 06:48:53 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-35594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17854757#comment-17854757
 ]


Rui Fan commented on FLINK-35594:
---------------------------------

This Jira may be duplicated with 
https://issues.apache.org/jira/browse/FLINK-33977

> Downscaling doesn't release TaskManagers.
> -----------------------------------------
>
>                 Key: FLINK-35594
>                 URL: https://issues.apache.org/jira/browse/FLINK-35594
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.1
>         Environment: * Flink 1.18.1 (Java 11, Temurin).
>  * Kubernetes Operator 1.8
>  * Kubernetes version v1.28.9-eks-036c24b (AWS EKS).
>  
> Autoscaling configuration:
> {code:java}
> jobmanager.scheduler: adaptive
> job.autoscaler.enabled: "true"
> job.autoscaler.metrics.window: 15m
> job.autoscaler.stabilization.interval: 15m
> job.autoscaler.scaling.effectiveness.threshold: 0.2
> job.autoscaler.target.utilization: "0.75"
> job.autoscaler.target.utilization.boundary: "0.25"
> job.autoscaler.metrics.busy-time.aggregator: "AVG"
> job.autoscaler.restart.time-tracking.enabled: "true"{code}
>            Reporter: Aviv Dozorets
>            Priority: Major
>         Attachments: Screenshot 2024-06-10 at 12.50.37 PM.png
>
>
> (Follow-up of Slack conversation on #troubleshooting channel).
> Recently I've observed a behavior, that should be improved:
> A Flink DataStream that runs with autoscaler (backed by Kubernetes operator) 
> and Adaptive scheduler doesn't release a node (TaskManager) when scaling 
> down. In my example job started with initial parallelism of 64, while having 
> 4 TM with 16 cores each (1:1 core:slot) and scaled down to 16.
> My expectation: 1 TaskManager should be up and running.
> Reality: All 4 initial TaskManagers are running, with multiple and unequal 
> amount of available slots.
>  
> Didn't find an existing configuration to change the behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-35594) Downscaling doesn't release TaskManagers.

Reply via email to