Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

Etienne Chauchot Fri, 16 Jun 2023 06:47:40 -0700

Hi Robert,

Thanks for your feedback. I don't know the scheduler part well enoughyet and I'm taking this ticket as a learning workshop.


Regarding your comments:

1. Taking a look at the AdaptiveScheduler class which takes all itsconfiguration from the JobManagerOptions, and also to be consistent withother parameters name, I'd suggest/jobmanager.scheduler-scaling-cooldown-period/

2. I thought scaling events existed already and the scheduler receivedthem as mentioned in FLIP-160 (cf "Whenever the scheduler is in theExecuting state and receives new slots") or in FLIP-138 (cf "Whenevernew slots are available the SlotPool notifies the Scheduler"). If it isnot the case (it is the scheduler who asks for slots), then there is noneed for storing scaling requests indeed.


=> I need a confirmation here

3. If we loose the JobManager, we loose both the AdaptiveScheduler stateand the CoolDownTimer state. So, upon recovery, it would be as if therewas no ongoing coolDown period. So, a first re-scale could happen rightaway and it will start a coolDown period. A second re-scale would haveto wait for the end of this period.

4. When a pipeline is re-scaled, it is restarted. Upon restart, theAdaptiveScheduler passes again in the "waiting for resources" state asFLIP-160 suggests. If so, then it seems that the coolDown period is kindof redundant with the resource-stabilization-timeout. I guess it is notthe case otherwise the FLINK-21883 ticket would not have been created.


=> I need a confirmation here also.


Thanks for your views on point 2 and 4.


Best

Etienne

Le 15/06/2023 à 13:35, Robert Metzger a écrit :

Thanks for the FLIP.

Some comments:
1. Can you specify the full proposed configuration name? "
scaling-cooldown-period" is probably not the full config name?
2. Why is the concept of scaling events and a scaling queue needed? If I
remember correctly, the adaptive scheduler will just check how many
TaskManagers are available and then adjust the execution graph accordingly.
There's no need to store a number of scaling events. We just need to
determine the time to trigger an adjustment of the execution graph.
3. What's the behavior wrt to JobManager failures (e.g. we lose the state
of the Adaptive Scheduler?). My proposal would be to just reset the
cooldown period, so after recovery of a JobManager, we have to wait at
least for the cooldown period until further scaling operations are done.
4. What's the relationship to the
"jobmanager.adaptive-scheduler.resource-stabilization-timeout"
configuration?

Thanks a lot for working on this!

Best,
Robert

On Wed, Jun 14, 2023 at 3:38 PM Etienne Chauchot<[email protected]>
wrote:

Hi all,

@Yukia,I updated the FLIP to include the aggregation of the staked
operations that we discussed below PTAL.

Best

Etienne


Le 13/06/2023 à 16:31, Etienne Chauchot a écrit :

Hi Yuxia,

Thanks for your feedback. The number of potentially stacked operations
depends on the configured length of the cooldown period.



The proposition in the FLIP is to add a minimum delay between 2 scaling
operations. But, indeed, an optimization could be to still stack the
operations (that arrive during a cooldown period) but maybe not take
only the last operation but rather aggregate them in order to end up
with a single aggregated operation when the cooldown period ends. For
example, let's say 3 taskManagers come up and 1 comes down during the
cooldown period, we could generate a single operation of scale up +2
when the period ends.

As a side note regarding your comment on "it'll take a long time to
finish all", please keep in mind that the reactive mode (at least for
now) is only available for streaming pipeline which are in essence
infinite processing.

Another side note: when you mention "every taskManagers connecting",
if you are referring to the start of the pipeline, please keep in mind
that the adaptive scheduler has a "waiting for resources" timeout
period before starting the pipeline in which all taskmanagers connect
and the parallelism is decided.

Best

Etienne

Le 13/06/2023 à 03:58, yuxia a écrit :

Hi, Etienne. Thanks for driving it. I have one question about the
mechanism of the cooldown timeout.

 From the Proposed Changes part, if a scalling event is received and
it falls during the cooldown period, it'll be stacked to be executed
after the period ends. Also, from the description of FLINK-21883[1],
cooldown timeout is to avoid rescaling the job very frequently,
because TaskManagers are not all connecting at the same time.

So, is it possible that every taskmanager connecting will produce a
scalling event and it'll be stacked with many scale up event which
causes it'll take a long time to finish all? Can we just take the
last one event?

[1]:https://issues.apache.org/jira/browse/FLINK-21883

Best regards, Yuxia

----- 原始邮件 ----- 发件人: "Etienne Chauchot"<[email protected]>
收件人:
"dev"<[email protected]>, "Robert Metzger"<[email protected]>
发送时间: 星期一, 2023年 6 月 12日 下午 11:34:25 主题: [DISCUSS] FLIP-322
Cooldown
period for adaptive scheduler

Hi,

I’d like to start a discussion about FLIP-322 [1] which introduces a
cooldown period for the adaptive scheduler.

I'd like to get your feedback especially @Robert as you opened the
related ticket and worked on the reactive mode a lot.

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-322+Cooldown+period+for+adaptive+scheduler

Best

Etienne

Re: [DISCUSS] FLIP-322 Cooldown period for adaptive scheduler

Reply via email to