[ 
https://issues.apache.org/jira/browse/FLINK-31757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712173#comment-17712173
 ] 

Weihua Hu commented on FLINK-31757:
-----------------------------------

I would like to bring in a common scenario that can't set tasks in the same 
parallelism.


Some ETL pipeline jobs consume Kafka data, and then do some heavy 
transformation in Map operation. In this scenario, we can't set all parallelism 
globally because of Kafka partition number limitations. 

> Optimize Flink un-balanced tasks scheduling
> -------------------------------------------
>
>                 Key: FLINK-31757
>                 URL: https://issues.apache.org/jira/browse/FLINK-31757
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: RocMarshal
>            Assignee: RocMarshal
>            Priority: Major
>         Attachments: image-2023-04-13-08-04-04-667.png
>
>
> Supposed we have a Job with 21 {{JobVertex}}. The parallelism of vertex A is 
> 100, and the others are 5. If each {{TaskManager}} only have one slot, then 
> we need 100 TMs.
> There will be 5 slots with 21 sub-tasks, and the others will only have one 
> sub-task of A. Does this mean we have to make a trade-off between wasted 
> resources and insufficient resources?
> From a resource utilization point of view, we expect all subtasks to be 
> evenly distributed on each TM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to