[ 
https://issues.apache.org/jira/browse/FLINK-33977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17950434#comment-17950434
 ] 

Ferenc Csaky commented on FLINK-33977:
--------------------------------------

Hi [~RocMarshal], applied the fixVersion, did not go through the mailing list 
thread and the PR yet, but will today. But just to be sure even before that, is 
this obsolete in the 2.x line? Furthermore, I see that the CI is red on the PR, 
can you take a look at that? I'm happy to help with the review and the merge to 
make sure this will be shipped in 1.20.2.

> Adaptive scheduler may not minimize the number of TMs during downscaling
> ------------------------------------------------------------------------
>
>                 Key: FLINK-33977
>                 URL: https://issues.apache.org/jira/browse/FLINK-33977
>             Project: Flink
>          Issue Type: Improvement
>          Components: Autoscaler, Runtime / Coordination
>    Affects Versions: 1.18.0, 1.19.0, 1.20.0
>            Reporter: Zhanghao Chen
>            Assignee: RocMarshal
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.20.2
>
>         Attachments: screenshot-1.png
>
>
> Adaptive Scheduler uses SlotAssigner to assign free slots to slot sharing 
> groups. Currently, there're two implementations of SlotAssigner available: 
> the 
> DefaultSlotAssigner that treats all slots and slot sharing groups equally and 
> the {color:#172b4d}StateLocalitySlotAssigner{color} that assigns slots based 
> on the number of local key groups to utilize local state recovery. The 
> scheduler will use the DefaultSlotAssigner when no key group assignment info 
> is available and use the StateLocalitySlotAssigner otherwise.
>  
> However, none of the SlotAssigner targets at minimizing the number of TMs, 
> which may produce suboptimal slot assignment under the Application Mode. For 
> example, when a job with 8 slot sharing groups and 2 TMs (each 4 slots) is 
> downscaled through the FLIP-291 API to have 4 slot sharing groups instead, 
> the cluster may still have 2 TMs, one with 1 free slot, and the other with 3 
> free slots. For end-users, this implies an ineffective downscaling as the 
> total cluster resources are not reduced.
>  
> We should take minimizing number of TMs into consideration as well. A 
> possible approach is to enhance the {color:#172b4d}StateLocalitySlotAssigner: 
> when the number of free slots exceeds need, sort all the TMs by a score 
> summing from the allocation scores of all slots on it, remove slots from the 
> excessive TMs with the lowest score and proceed the remaining slot 
> assignment.{color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to