[jira] [Commented] (FLINK-31898) Flink k8s autoscaler does not work as expected

Kyungmin Kim (Jira) Tue, 25 Apr 2023 19:34:19 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17716490#comment-17716490
 ]


Kyungmin Kim commented on FLINK-31898:
--------------------------------------

[~gyfora] 

After watching more metrics, I found out that the busyMsPerSecond metric does 
fluctuate a lot (It records only 1k or zero) and I think it results in 
incorrect TRUE_PROCESSING_RATE. 

It was because my test job throttles the number of record inputs per second.

I changed my job's behavior to allow all inputs, add some delay inside the map 
operator and change the configuration as you suggested. 

Autoscaler now works very well :). It finds the optimal parallelism. 

Sorry for the confusion and I think you can close the issue.

By the way can you let me know when you guys are planning to release 1.5 
version? 

> Flink k8s autoscaler does not work as expected
> ----------------------------------------------
>
>                 Key: FLINK-31898
>                 URL: https://issues.apache.org/jira/browse/FLINK-31898
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler, Kubernetes Operator
>    Affects Versions: kubernetes-operator-1.4.0
>            Reporter: Kyungmin Kim
>            Priority: Major
>         Attachments: image-2023-04-24-10-54-58-083.png, 
> image-2023-04-24-13-27-17-478.png, image-2023-04-24-13-28-15-462.png, 
> image-2023-04-24-13-31-06-420.png, image-2023-04-24-13-41-43-040.png, 
> image-2023-04-24-13-42-40-124.png, image-2023-04-24-13-43-49-431.png, 
> image-2023-04-24-13-44-17-479.png, image-2023-04-24-14-18-12-450.png, 
> image-2023-04-24-16-47-35-697.png
>
>
> Hi I'm using Flink k8s autoscaler to automatically deploy jobs in proper 
> parallelism.
> I was using 1.4 version but I found that it does not scale down properly 
> because TRUE_PROCESSING_RATE becoming NaN when the tasks are idled.
> In the main branch, I checked the code was fixed to set TRUE_PROCESSING_RATE 
> to positive infinity and make scaleFactor to very low value so I'm now 
> experimentally using docker image built with main branch of 
> Flink-k8s-operator repository in my job.
> It now scales down properly but the problem is, it does not converge to the 
> optimal parallelism. It scales down well but it jumps up again to high 
> parallelism. 
>  
> Below is the experimental setup and my figure of parallelism changes result.
>  * about 40 RPS
>  * each task can process 10 TPS (intended throttling)
> !image-2023-04-24-10-54-58-083.png|width=999,height=266!
> Even using default configuration leads to the same result. What can I do 
> more? Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31898) Flink k8s autoscaler does not work as expected

Reply via email to