[ 
https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826923#comment-17826923
 ] 

Rui Fan commented on FLINK-34655:
---------------------------------

{quote}I understand the idea behind providing suggestions. However, it is 
difficult to assess the quality of Autoscaling decisions without applying them 
automatically. The reason is that suggestions become stale very quickly if the 
load pattern is not completely static. Even for static load patterns, if the 
user doesn't redeploy in a matter of minutes, the suggestions might already be 
stale again when the number of pending records increased too much. In any case, 
production load patterns are rarely static which means that autoscaling will 
inevitable trigger multiple times a day, but that is where its real power is 
unleashed. It would be great to hear about any concerns your users have for 
turning on automatic scaling. {quote}

Thanks for pointing it out! 

It is indeed difficult to observe the dynamic changes of the load. But users 
don't want to use a huge feature without observe. This does not only refer to 
autoscaler, but to all major features, users need to do enough research before 
they can be applied to the production environment. 

Although the parallelism may change dynamically, based on historical 
experience, users are more concerned about whether the parallelism is 
reasonable during peak periods. Currently, jdbc event handler recorded all 
ScalingReports. The ScalingReport includes the create time, users can check 
them conveniently. 

{quote}We've been operating it in production for about a year now.{quote}

It's great to see that your users have been using autoscaler for a long time. I 
believe it will give the entire community more confidence in using the 
autoscaler.

{quote}Back to the issue here, should we think about a patch release for 1.15 / 
1.16 to add support for overriding vertex parallelism?{quote}

I agree with [~gyfora], the 1.15 and 1.16 won't be released anymore. So 
community doesn't need to backport them. If some users want to use these 
features, it's better to use the new version or cherry pick them to their 
internal flink version.

> Autoscaler doesn't work for flink 1.15
> --------------------------------------
>
>                 Key: FLINK-34655
>                 URL: https://issues.apache.org/jira/browse/FLINK-34655
>             Project: Flink
>          Issue Type: Bug
>          Components: Autoscaler
>            Reporter: Rui Fan
>            Assignee: Rui Fan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: kubernetes-operator-1.8.0
>
>
> flink-ubernetes-operator is committed to supporting the latest 4 flink minor 
> versions, and autoscaler is a part of flink-ubernetes-operator. Currently,  
> the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18.
> But autoscaler doesn't work for  flink 1.15.
> h2. Root cause: 
> * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16
> * IOMetricsInfo is a part of JobDetailsInfo
> * JobDetailsInfo is necessary for autoscaler [1]
> * flink's RestClient doesn't allow miss any property during deserializing the 
> json
> That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for 
> 1.15 jobs.
> h2. How to fix it properly?
> - [[FLINK-34655](https://issues.apache.org/jira/browse/FLINK-34655)] Copy 
> IOMetricsInfo to flink-autoscaler-standalone module
> - Removing them after 1.15 are not supported
> [1] 
> https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109
> [2] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to