[ 
https://issues.apache.org/jira/browse/FLINK-15325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000940#comment-17000940
 ] 

Zhu Zhu commented on FLINK-15325:
---------------------------------

Yes, if the config is enabled, 
{{ExecutionVertex#getPreferredLocationsBasedOnInputs()}} can simply return an 
empty set to achieve the goal.

Looks to me it could be a problem only if 
1. there is a 1-to-N pattern in topology (or k-to-N where k is a small number 
<= 8 while N is much larger), and
2. number of available slots in the JM SlotPool are much more than N (currently 
this only happens in batch jobs that runs more tasks in previous stages so that 
the latter stages would see more slots than needed), and
3. loads of tasks are heavy so that performance degradation happens when too 
many tasks are running on the same machine

So yes it is not a very common case and at the moment is not a problem for 
streaming jobs. I think it is not very urgent to do this improvement.
But given that users may have no other way to solve it when this problem 
happens, and the config is easy to understand and simple to implement, I think 
it's also valid to introduce such a config.



> Input location preference which affects task distribution may make certain 
> job performance worse 
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-15325
>                 URL: https://issues.apache.org/jira/browse/FLINK-15325
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Priority: Major
>         Attachments: D58ADB03-7187-46B1-B077-91E5005FD463.png
>
>
> When running TPC-DS jobs in a session cluster, we observed that sometimes 
> tasks are not evenly distributed in TMs. The root cause turned out to be that 
> the downstream tasks tend to be TM or host local with its input tasks. This 
> helps to reduce network shuffle. 
> However, in certain cases, like the topology presented in the attached image, 
> jamming the input task's TM and machine with downstream tasks would affect 
> the performance. In this case, respecting input location preferences is 
> causing troubles more than bringing benefits.
> So I'm wondering whether we should introduce a config so that users can 
> disable input location preferences?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to