[
https://issues.apache.org/jira/browse/FLINK-31608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler updated FLINK-31608:
-------------------------------------
Description:
This option was meant to prevent scale up operations where the benefit doesn't
outweigh the cost, like scaling up to increase a single vertices parallelism by
1. Meanwhile, scale-down operations were always immediately executed, because
they were always the result of a stopped TaskManager, causing the job to
restart anyway.
Now that users can change the requirements at will this has changed, and the
expected behavior is overall undefined.
We need to answer:
* should there be a dedicated option for limiting scale-down operations if the
requirements were changed?
* should the min-parallelism-*increase* option be generalized to a
min-parallelism-*change* option?
* How shall operations be handled that scale different vertices up or down at
the same? So far the decision was made on the cumulative parallelism change,
but in this case the parallelism distribution can change significantly while
the cumulative change is 0.
* If a rescale operation was not applied due to these limits, should they be
_eventually_ applied anyway (e.g., after a timeout)?
was:
This option was meant to prevent scale up operations where the benefit doesn't
outweigh the cost, like scaling up to increase a single vertices parallelism by
1. Meanwhile, scale-down operations were always immediately executed, because
they were always the result of a stopped TaskManager, causing the job to
restart anyway.
Now that users can change the requirements at will this has changed, and the
expected behavior is overall undefined.
We need to answer:
* should there be a dedicated option for limiting scale-down operations if the
requirements were changed?
* should the min-parallelism-*increase* option be generalized to a
min-parallelism-*change* option?
* How shall operations be handled that scale different vertices up or down at
the same? So far the decision was made on the cumulative parallelism change,
but in this case the parallelism distribution can change significantly while
the cumulative change is 0.
> Re-evaluate 'min-parallelism-increase' option
> ---------------------------------------------
>
> Key: FLINK-31608
> URL: https://issues.apache.org/jira/browse/FLINK-31608
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Configuration, Runtime / Coordination
> Reporter: Chesnay Schepler
> Priority: Major
> Fix For: 1.18.0
>
>
> This option was meant to prevent scale up operations where the benefit
> doesn't outweigh the cost, like scaling up to increase a single vertices
> parallelism by 1. Meanwhile, scale-down operations were always immediately
> executed, because they were always the result of a stopped TaskManager,
> causing the job to restart anyway.
> Now that users can change the requirements at will this has changed, and the
> expected behavior is overall undefined.
> We need to answer:
> * should there be a dedicated option for limiting scale-down operations if
> the requirements were changed?
> * should the min-parallelism-*increase* option be generalized to a
> min-parallelism-*change* option?
> * How shall operations be handled that scale different vertices up or down at
> the same? So far the decision was made on the cumulative parallelism change,
> but in this case the parallelism distribution can change significantly while
> the cumulative change is 0.
> * If a rescale operation was not applied due to these limits, should they be
> _eventually_ applied anyway (e.g., after a timeout)?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)