Hi, community!
When working on JIRA[1]: during adaptive rescaling, what strategy should be used to select candidate slots for ensuring efficient/expected resource utilization? We have received some lively discussions and valuable feedback (thanks for Matthias, Rui, Gyula, Maximilian, Tison, etc.): - When deploying jobs with an Application cluster, there seems to be a preference for using the fewest TaskManagers. - When deploying with a Session cluster, a slot spread-out strategy(taskmanager.load-balance.mode=SLOTS)[2] is favored to achieve more balanced resource usage and to ignore the minimal taskmanagers strategy. - Some comments have also suggested introducing configurable parameters to specify strategy priorities, giving users more flexibility when there are conflicts between different strategies as[3] - more discussion details or valuable ide could be viewed here[4] Therefore, we need a discussion to clarify these matters and solutions. It’s worth mentioning that this discussion should also include taskmanager.load-balance.mode=SLOTS[2] for two main reasons: - Firstly, the current discussions have already highlighted this issue, - Secondly, this aspect will soon be introduced with the upcoming JIRA changes. Based on the existing discussions, I have summarized some conflicting items and proposed simple alternatives strategies[5] if no new configuration items are introduced, hoping to provide a little value in advancing the current work. Looking forward to your comments and attention! Thank you. [1] https://issues.apache.org/jira/browse/FLINK-33977 [2] https://issues.apache.org/jira/browse/FLINK-33390 [3] https://issues.apache.org/jira/browse/FLINK-36426 [4] https://github.com/apache/flink/pull/25218#issuecomment-2401913141 [5] https://docs.google.com/document/d/1NfY6O8mdkr3gKczJS9dGE3LACrf18sum_6GCaXYq4as/edit?tab=t.0#heading=h.flwxxqng4hh7