[
https://issues.apache.org/jira/browse/FLINK-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yangze Guo closed FLINK-15959.
------------------------------
Fix Version/s: 1.19.0
Resolution: Fixed
> Add min number of slots configuration to limit total number of slots
> --------------------------------------------------------------------
>
> Key: FLINK-15959
> URL: https://issues.apache.org/jira/browse/FLINK-15959
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.11.0
> Reporter: YufeiLiu
> Assignee: xiangyu feng
> Priority: Major
> Labels: auto-deprioritized-major, auto-deprioritized-minor,
> auto-unassigned, pull-request-available
> Fix For: 1.19.0
>
>
> Flink removed `-n` option after FLIP-6, change to ResourceManager start a new
> worker when required. But I think maintain a certain amount of slots is
> necessary. These workers will start immediately when ResourceManager starts
> and would not release even if all slots are free.
> Here are some resons:
> # Users actually know how many resources are needed when run a single job,
> initialize all workers when cluster starts can speed up startup process.
> # Job schedule in topology order, next operator won't schedule until prior
> execution slot allocated. The TaskExecutors will start in several batchs in
> some cases, it might slow down the startup speed.
> # Flink support
> [FLINK-12122|https://issues.apache.org/jira/browse/FLINK-12122] [Spread out
> tasks evenly across all available registered TaskManagers], but it will only
> effect if all TMs are registered. Start all TMs at begining can slove this
> problem.
> *suggestion:*
> * Add config "taskmanager.minimum.numberOfTotalSlots" and
> "taskmanager.maximum.numberOfTotalSlots", default behavior is still like
> before.
> * Start plenty number of workers to satisfy minimum slots when
> ResourceManager accept leadership(subtract recovered workers).
> * Don't comlete slot request until minimum number of slots are registered,
> and throw exeception when exceed maximum.
> *update*
> Finally, we'd like to introduce three config options related to the minimum
> resources restriction:
> * slotmanager.min-total-resource.cpu
> * slotmanager.min-total-resource.memory
> * slotmanager.number-of-slots.min
> Note that these configuration do not take effect for standalone clusters,
> where how many slots are allocated is not controlled by Flink. These config
> are best effort and Flink will not block the job progress even if the min
> resources are not guaranteed.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)