Hi Yuxia,

Thanks for your feedback. The number of potentially stacked operations depends on the configured length of the cooldown period.



The proposition in the FLIP is to add a minimum delay between 2 scaling
operations. But, indeed, an optimization could be to still stack the
operations (that arrive during a cooldown period) but maybe not take
only the last operation but rather aggregate them in order to end up
with a single aggregated operation when the cooldown period ends. For
example, let's say 3 taskManagers come up and 1 comes down during the
cooldown period, we could generate a single operation of scale up +2
when the period ends.

As a side note regarding your comment on "it'll take a long time to finish all", please keep in mind that the reactive mode (at least for now) is only available for streaming pipeline which are in essence infinite processing.

Another side note: when you mention "every taskManagers connecting", if you are referring to the start of the pipeline, please keep in mind that the adaptive scheduler has a "waiting for resources" timeout period before starting the pipeline in which all taskmanagers connect and the parallelism is decided.

Best

Etienne

Le 13/06/2023 à 03:58, yuxia a écrit :
Hi, Etienne. Thanks for driving it. I have one question about the
mechanism of the cooldown timeout.

From the Proposed Changes part, if a scalling event is received and
it falls during the cooldown period, it'll be stacked to be executed
after the period ends. Also, from the description of FLINK-21883[1],
cooldown timeout is to avoid rescaling the job very frequently,
because TaskManagers are not all connecting at the same time.

So, is it possible that every taskmanager connecting will produce a
scalling event and it'll be stacked with many scale up event which
causes it'll take a long time to finish all? Can we just take the
last one event?

[1]: https://issues.apache.org/jira/browse/FLINK-21883

Best regards, Yuxia

----- 原始邮件 ----- 发件人: "Etienne Chauchot" <echauc...@apache.org> 收件人:
"dev" <dev@flink.apache.org>, "Robert Metzger" <metrob...@gmail.com> 发送时间: 星期一, 2023年 6 月 12日 下午 11:34:25 主题: [DISCUSS] FLIP-322 Cooldown
period for adaptive scheduler

Hi,

I’d like to start a discussion about FLIP-322 [1] which introduces a cooldown period for the adaptive scheduler.

I'd like to get your feedback especially @Robert as you opened the related ticket and worked on the reactive mode a lot.

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler



Best

Etienne

Reply via email to