[
https://issues.apache.org/jira/browse/FLINK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferenc Csaky closed FLINK-33977.
--------------------------------
Resolution: Fixed
[{{1ac4f3d}}|https://github.com/apache/flink/commit/1ac4f3d182cba946663d69dc180a6875f17ab542]
in master
[{{c691899}}|https://github.com/apache/flink/commit/c691899859b8caacf63251ae4de89012ade3836d]
in release-2.0
[{{8cd774f}}|https://github.com/apache/flink/commit/8cd774f7fcf71275a19fc0d7b5bb21a1ad90bd98]
in release-1.20
[{{8348f7d}}|https://github.com/apache/flink/commit/8348f7d96c73418beb8fea26de698566306cc10e]
in release-1.19
> Adaptive scheduler may not minimize the number of TMs during downscaling
> ------------------------------------------------------------------------
>
> Key: FLINK-33977
> URL: https://issues.apache.org/jira/browse/FLINK-33977
> Project: Flink
> Issue Type: Improvement
> Components: Autoscaler, Runtime / Coordination
> Affects Versions: 2.0.0, 1.18.0, 1.19.0, 1.20.0
> Reporter: Zhanghao Chen
> Assignee: RocMarshal
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.19.3, 1.20.2, 2.0.1
>
> Attachments: image-2025-05-09-21-52-13-196.png, screenshot-1.png
>
>
> Adaptive Scheduler uses SlotAssigner to assign free slots to slot sharing
> groups. Currently, there're two implementations of SlotAssigner available:
> the
> DefaultSlotAssigner that treats all slots and slot sharing groups equally and
> the {color:#172b4d}StateLocalitySlotAssigner{color} that assigns slots based
> on the number of local key groups to utilize local state recovery. The
> scheduler will use the DefaultSlotAssigner when no key group assignment info
> is available and use the StateLocalitySlotAssigner otherwise.
>
> However, none of the SlotAssigner targets at minimizing the number of TMs,
> which may produce suboptimal slot assignment under the Application Mode. For
> example, when a job with 8 slot sharing groups and 2 TMs (each 4 slots) is
> downscaled through the FLIP-291 API to have 4 slot sharing groups instead,
> the cluster may still have 2 TMs, one with 1 free slot, and the other with 3
> free slots. For end-users, this implies an ineffective downscaling as the
> total cluster resources are not reduced.
>
> We should take minimizing number of TMs into consideration as well. A
> possible approach is to enhance the {color:#172b4d}StateLocalitySlotAssigner:
> when the number of free slots exceeds need, sort all the TMs by a score
> summing from the allocation scores of all slots on it, remove slots from the
> excessive TMs with the lowest score and proceed the remaining slot
> assignment.{color}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)