[ https://issues.apache.org/jira/browse/FLINK-34655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17826923#comment-17826923 ]
Rui Fan commented on FLINK-34655: --------------------------------- {quote}I understand the idea behind providing suggestions. However, it is difficult to assess the quality of Autoscaling decisions without applying them automatically. The reason is that suggestions become stale very quickly if the load pattern is not completely static. Even for static load patterns, if the user doesn't redeploy in a matter of minutes, the suggestions might already be stale again when the number of pending records increased too much. In any case, production load patterns are rarely static which means that autoscaling will inevitable trigger multiple times a day, but that is where its real power is unleashed. It would be great to hear about any concerns your users have for turning on automatic scaling. {quote} Thanks for pointing it out! It is indeed difficult to observe the dynamic changes of the load. But users don't want to use a huge feature without observe. This does not only refer to autoscaler, but to all major features, users need to do enough research before they can be applied to the production environment. Although the parallelism may change dynamically, based on historical experience, users are more concerned about whether the parallelism is reasonable during peak periods. Currently, jdbc event handler recorded all ScalingReports. The ScalingReport includes the create time, users can check them conveniently. {quote}We've been operating it in production for about a year now.{quote} It's great to see that your users have been using autoscaler for a long time. I believe it will give the entire community more confidence in using the autoscaler. {quote}Back to the issue here, should we think about a patch release for 1.15 / 1.16 to add support for overriding vertex parallelism?{quote} I agree with [~gyfora], the 1.15 and 1.16 won't be released anymore. So community doesn't need to backport them. If some users want to use these features, it's better to use the new version or cherry pick them to their internal flink version. > Autoscaler doesn't work for flink 1.15 > -------------------------------------- > > Key: FLINK-34655 > URL: https://issues.apache.org/jira/browse/FLINK-34655 > Project: Flink > Issue Type: Bug > Components: Autoscaler > Reporter: Rui Fan > Assignee: Rui Fan > Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.8.0 > > > flink-ubernetes-operator is committed to supporting the latest 4 flink minor > versions, and autoscaler is a part of flink-ubernetes-operator. Currently, > the latest 4 flink minor versions are 1.15, 1.16, 1.17 and 1.18. > But autoscaler doesn't work for flink 1.15. > h2. Root cause: > * FLINK-28310 added some properties in IOMetricsInfo in flink-1.16 > * IOMetricsInfo is a part of JobDetailsInfo > * JobDetailsInfo is necessary for autoscaler [1] > * flink's RestClient doesn't allow miss any property during deserializing the > json > That means that the RestClient after 1.15 cannot fetch JobDetailsInfo for > 1.15 jobs. > h2. How to fix it properly? > - [[FLINK-34655](https://issues.apache.org/jira/browse/FLINK-34655)] Copy > IOMetricsInfo to flink-autoscaler-standalone module > - Removing them after 1.15 are not supported > [1] > https://github.com/apache/flink-kubernetes-operator/blob/ede1a610b3375d31a2e82287eec67ace70c4c8df/flink-autoscaler/src/main/java/org/apache/flink/autoscaler/ScalingMetricCollector.java#L109 > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-401%3A+REST+API+JSON+response+deserialization+unknown+field+tolerance -- This message was sent by Atlassian Jira (v8.20.10#820010)