[ 
https://issues.apache.org/jira/browse/TEZ-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095044#comment-14095044
 ] 

Siddharth Seth commented on TEZ-1397:
-------------------------------------

Hot spots would be created if the same data is being processed (and the same 
splits). If capacity is not available, the regular fallback technique would 
just take a split to the next available node on which capacity is available. 
Basing this off active caching would lead to similar behaviour - if always 
scheduling to the cached location.

> Node affinity for tasks processing the same splits
> --------------------------------------------------
>
>                 Key: TEZ-1397
>                 URL: https://issues.apache.org/jira/browse/TEZ-1397
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>
> Within a session, if the same set of HDFS blocks are accessed by different 
> tasks - these should ideally be launched on the same node for better buffer 
> cache, etc utilization.
> This will likely end up being another level of requests higher up than 
> NODE_LOCAL for the scheduler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to