[ 
https://issues.apache.org/jira/browse/FLINK-31608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chesnay Schepler updated FLINK-31608:
-------------------------------------
    Description: 
This option was meant to prevent scale up operations where the benefit doesn't 
outweigh the cost, like scaling up to increase a single vertices parallelism by 
1. Meanwhile, scale-down operations were always immediately executed, because 
they were always the result of a stopped TaskManager, causing the job to 
restart anyway.

Now that users can change the requirements at will this has changed, and the 
expected behavior is overall undefined.

We need to answer:
* should there be a dedicated option for limiting scale-down operations if the 
requirements were changed?
* should the min-parallelism-*increase* option be generalized to a 
min-parallelism-*change* option?
* How shall operations be handled that scale different vertices up or down at 
the same? So far the decision was made on the cumulative parallelism change, 
but in this case the parallelism distribution can change significantly while 
the cumulative change is 0.
* If a rescale operation was not applied due to these limits, should they be 
_eventually_ applied anyway (e.g., after a timeout)?

  was:
This option was meant to prevent scale up operations where the benefit doesn't 
outweigh the cost, like scaling up to increase a single vertices parallelism by 
1. Meanwhile, scale-down operations were always immediately executed, because 
they were always the result of a stopped TaskManager, causing the job to 
restart anyway.

Now that users can change the requirements at will this has changed, and the 
expected behavior is overall undefined.

We need to answer:
* should there be a dedicated option for limiting scale-down operations if the 
requirements were changed?
* should the min-parallelism-*increase* option be generalized to a 
min-parallelism-*change* option?
* How shall operations be handled that scale different vertices up or down at 
the same? So far the decision was made on the cumulative parallelism change, 
but in this case the parallelism distribution can change significantly while 
the cumulative change is 0.


> Re-evaluate 'min-parallelism-increase' option
> ---------------------------------------------
>
>                 Key: FLINK-31608
>                 URL: https://issues.apache.org/jira/browse/FLINK-31608
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Configuration, Runtime / Coordination
>            Reporter: Chesnay Schepler
>            Priority: Major
>             Fix For: 1.18.0
>
>
> This option was meant to prevent scale up operations where the benefit 
> doesn't outweigh the cost, like scaling up to increase a single vertices 
> parallelism by 1. Meanwhile, scale-down operations were always immediately 
> executed, because they were always the result of a stopped TaskManager, 
> causing the job to restart anyway.
> Now that users can change the requirements at will this has changed, and the 
> expected behavior is overall undefined.
> We need to answer:
> * should there be a dedicated option for limiting scale-down operations if 
> the requirements were changed?
> * should the min-parallelism-*increase* option be generalized to a 
> min-parallelism-*change* option?
> * How shall operations be handled that scale different vertices up or down at 
> the same? So far the decision was made on the cumulative parallelism change, 
> but in this case the parallelism distribution can change significantly while 
> the cumulative change is 0.
> * If a rescale operation was not applied due to these limits, should they be 
> _eventually_ applied anyway (e.g., after a timeout)?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to