[
https://issues.apache.org/jira/browse/FLINK-15325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17226074#comment-17226074
]
Zhu Zhu commented on FLINK-15325:
---------------------------------
Maybe a better option is to ignore the input location preferences over a result
partition if it has too many consumers. The threshold can be configurable and
the default value can be very large so there will no behavior change by default.
In this way, we do not need to fully disable input locality. {{FORWARD}} input
locality can still be respected even if it is the job in the attached image.
> Input location preference which affects task distribution may make certain
> job performance worse
> -------------------------------------------------------------------------------------------------
>
> Key: FLINK-15325
> URL: https://issues.apache.org/jira/browse/FLINK-15325
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.10.0
> Reporter: Zhu Zhu
> Priority: Major
> Fix For: 1.12.0
>
> Attachments: D58ADB03-7187-46B1-B077-91E5005FD463.png
>
>
> When running TPC-DS jobs in a session cluster, we observed that sometimes
> tasks are not evenly distributed in TMs. The root cause turned out to be that
> the downstream tasks tend to be TM or host local with its input tasks. This
> helps to reduce network shuffle.
> However, in certain cases, like the topology presented in the attached image,
> jamming the input task's TM and machine with downstream tasks would affect
> the performance. In this case, respecting input location preferences is
> causing troubles more than bringing benefits.
> So I'm wondering whether we should introduce a config so that users can
> disable input location preferences?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)