[jira] [Comment Edited] (FLINK-31757) Optimize Flink un-balanced tasks scheduling

Rui Fan (Jira) Thu, 13 Apr 2023 08:09:03 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17711932#comment-17711932
 ]


Rui Fan edited comment on FLINK-31757 at 4/13/23 3:08 PM:
----------------------------------------------------------

Hi [~chesnay] , thanks for your reply.
{quote}The obvious solution for the user is to set the parallelism to 100 for 
everything if the describe issues are a problem.
{quote}
In some scenarios, setting all parallelism globally will waste resources or 
setting low parallelism for some tasks is a good choice. For example, flink job 
has too many sources, each source has only 5 partitions. So setting parallelism 
to 5 for each source is enough.

Or the business logic is very complex, the flink job has dozens of tasks, and 
the user sets a reasonable parallelism according to the busy ratio of the tasks 
(similar to FLIP-AutoScalar).

In general, it is a common scenario that the parallelism of multiple tasks is 
different. For this scenario, it is unreasonable for resource balance that the 
front TM runs a large number of tasks and the subsequent TMs run a small number 
of tasks.

 

This is a Flink job DAG in our production, it's too complex. Setting all 
parallelism globally will cause some problems:
 * Need too many network memory
 * JM schedules more tasks and starts jobs slower
 * Create too many task threads

!image-2023-04-13-08-04-04-667.png!


was (Author: fanrui):
Hi [~chesnay] , thanks for your reply.
{quote}The obvious solution for the user is to set the parallelism to 100 for 
everything if the describe issues are a problem.
{quote}
In some scenarios, setting all parallelism globally will waste resources or 
setting low parallelism for some tasks is a good choice. For example, flink job 
has too many sources, each source has only 5 partitions. So setting parallelism 
to 5 for each source is enough.

Or the business logic is very complex, the flink job has dozens of tasks, and 
the user sets a reasonable parallelism according to the busy ratio of the tasks 
(similar to FLIP-AutoScalar).

In general, it is a common scenario that the parallelism of multiple tasks is 
different. For this scenario, it is unreasonable for resource balance that the 
front TM runs a large number of tasks and the subsequent TMs run a small number 
of tasks.

> Optimize Flink un-balanced tasks scheduling
> -------------------------------------------
>
>                 Key: FLINK-31757
>                 URL: https://issues.apache.org/jira/browse/FLINK-31757
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: RocMarshal
>            Assignee: RocMarshal
>            Priority: Major
>         Attachments: image-2023-04-13-08-04-04-667.png
>
>
> Supposed we have a Job with 21 {{JobVertex}}. The parallelism of vertex A is 
> 100, and the others are 5. If each {{TaskManager}} only have one slot, then 
> we need 100 TMs.
> There will be 5 slots with 21 sub-tasks, and the others will only have one 
> sub-task of A. Does this mean we have to make a trade-off between wasted 
> resources and insufficient resources?
> From a resource utilization point of view, we expect all subtasks to be 
> evenly distributed on each TM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-31757) Optimize Flink un-balanced tasks scheduling

Reply via email to