[
https://issues.apache.org/jira/browse/FLINK-31757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17712173#comment-17712173
]
Weihua Hu commented on FLINK-31757:
-----------------------------------
I would like to bring in a common scenario that can't set tasks in the same
parallelism.
Some ETL pipeline jobs consume Kafka data, and then do some heavy
transformation in Map operation. In this scenario, we can't set all parallelism
globally because of Kafka partition number limitations.
> Optimize Flink un-balanced tasks scheduling
> -------------------------------------------
>
> Key: FLINK-31757
> URL: https://issues.apache.org/jira/browse/FLINK-31757
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: RocMarshal
> Assignee: RocMarshal
> Priority: Major
> Attachments: image-2023-04-13-08-04-04-667.png
>
>
> Supposed we have a Job with 21 {{JobVertex}}. The parallelism of vertex A is
> 100, and the others are 5. If each {{TaskManager}} only have one slot, then
> we need 100 TMs.
> There will be 5 slots with 21 sub-tasks, and the others will only have one
> sub-task of A. Does this mean we have to make a trade-off between wasted
> resources and insufficient resources?
> From a resource utilization point of view, we expect all subtasks to be
> evenly distributed on each TM.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)