Hi Max, Thanks a lot for the FLIP. It is an extremely attractive feature!
Just some follow up questions/thoughts after reading the FLIP: In the doc, the discussion of the strategy of “scaling out” is thorough and convincing to me but it seems that “scaling down” is less discussed. I have 2 cents for this aspect: 1. For source parallelisms, if the user configure a much larger value than normal, there should be very little pending records though it is possible to get optimized. But IIUC, in current algorithm, we will not take actions for this case as the backlog growth rate is almost zero. Is the understanding right? 2. Compared with “scaling out”, “scaling in” is usually more dangerous as it is more likely to lead to negative influence to the downstream jobs. The min/max load bounds should be useful. I am wondering if it is possible to have different strategy for “scaling in” to make it more conservative. Or more eagerly, allow custom autoscaling strategy(e.g. time-based strategy). Another side thought is that to recover a job from checkpoint/savepoint, the new parallelism cannot be larger than max parallelism defined in the checkpoint(see this<https://github.com/apache/flink/blob/17a782c202c93343b8884cb52f4562f9c4ba593f/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/Checkpoints.java#L128>). Not sure if this limit should be mentioned in the FLIP. Again, thanks for the great work and looking forward to using flink k8s operator with it! Best, Biao Geng From: Maximilian Michels <m...@apache.org> Date: Saturday, November 5, 2022 at 2:37 AM To: dev <dev@flink.apache.org> Cc: Gyula Fóra <gyula.f...@gmail.com>, Thomas Weise <t...@apache.org>, Marton Balassi <mbala...@apache.org>, Őrhidi Mátyás <matyas.orh...@gmail.com> Subject: [DISCUSS] FLIP-271: Autoscaling Hi, I would like to kick off the discussion on implementing autoscaling for Flink as part of the Flink Kubernetes operator. I've outlined an approach here which I find promising: https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling I've been discussing this approach with some of the operator contributors: Gyula, Marton, Matyas, and Thomas (all in CC). We started prototyping an implementation based on the current FLIP design. If that goes well, we would like to contribute this to Flink based on the results of the discussion here. I'm curious to hear your thoughts. -Max