Hi Max,

Thanks a lot for the FLIP. It is an extremely attractive feature!

Just some follow up questions/thoughts after reading the FLIP:
In the doc, the discussion of  the strategy of “scaling out” is thorough and 
convincing to me but it seems that “scaling down” is less discussed. I have 2 
cents for this aspect:

  1.  For source parallelisms, if the user configure a much larger value than 
normal, there should be very little pending records though it is possible to 
get optimized. But IIUC, in current algorithm, we will not take actions for 
this case as the backlog growth rate is almost zero. Is the understanding right?
  2.  Compared with “scaling out”, “scaling in” is usually more dangerous as it 
is more likely to lead to negative influence to the downstream jobs. The 
min/max load bounds should be useful. I am wondering if it is possible to have 
different strategy for “scaling in” to make it more conservative. Or more 
eagerly, allow custom autoscaling strategy(e.g. time-based strategy).

Another side thought is that to recover a job from checkpoint/savepoint, the 
new parallelism cannot be larger than max parallelism defined in the 
checkpoint(see 
this<https://github.com/apache/flink/blob/17a782c202c93343b8884cb52f4562f9c4ba593f/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/Checkpoints.java#L128>).
 Not sure if this limit should be mentioned in the FLIP.

Again, thanks for the great work and looking forward to using flink k8s 
operator with it!

Best,
Biao Geng

From: Maximilian Michels <m...@apache.org>
Date: Saturday, November 5, 2022 at 2:37 AM
To: dev <dev@flink.apache.org>
Cc: Gyula Fóra <gyula.f...@gmail.com>, Thomas Weise <t...@apache.org>, Marton 
Balassi <mbala...@apache.org>, Őrhidi Mátyás <matyas.orh...@gmail.com>
Subject: [DISCUSS] FLIP-271: Autoscaling
Hi,

I would like to kick off the discussion on implementing autoscaling for
Flink as part of the Flink Kubernetes operator. I've outlined an approach
here which I find promising:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-271%3A+Autoscaling

I've been discussing this approach with some of the operator contributors:
Gyula, Marton, Matyas, and Thomas (all in CC). We started prototyping an
implementation based on the current FLIP design. If that goes well, we
would like to contribute this to Flink based on the results of the
discussion here.

I'm curious to hear your thoughts.

-Max

Reply via email to