[
https://issues.apache.org/jira/browse/FLINK-31757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711932#comment-17711932
]
Rui Fan commented on FLINK-31757:
---------------------------------
Hi [~chesnay] , thanks for your reply.
{quote}The obvious solution for the user is to set the parallelism to 100 for
everything if the describe issues are a problem.
{quote}
In some scenarios, setting all parallelism globally will waste resources or
setting low parallelism for some tasks is a good choice. For example, flink job
has too many sources, each source has only 5 partitions. So setting parallelism
to 5 for each source is enough.
Or the business logic is very complex, the flink job has dozens of tasks, and
the user sets a reasonable parallelism according to the busy ratio of the tasks
(similar to FLIP-AutoScalar).
In general, it is a common scenario that the parallelism of multiple tasks is
different. For this scenario, it is unreasonable for resource balance that the
front TM runs a large number of tasks and the subsequent TMs run a small number
of tasks.
> Optimize Flink un-balanced tasks scheduling
> -------------------------------------------
>
> Key: FLINK-31757
> URL: https://issues.apache.org/jira/browse/FLINK-31757
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: RocMarshal
> Assignee: RocMarshal
> Priority: Major
>
> Supposed we have a Job with 21 {{JobVertex}}. The parallelism of vertex A is
> 100, and the others are 5. If each {{TaskManager}} only have one slot, then
> we need 100 TMs.
> There will be 5 slots with 21 sub-tasks, and the others will only have one
> sub-task of A. Does this mean we have to make a trade-off between wasted
> resources and insufficient resources?
> From a resource utilization point of view, we expect all subtasks to be
> evenly distributed on each TM.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)