[
https://issues.apache.org/jira/browse/TEZ-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095044#comment-14095044
]
Siddharth Seth commented on TEZ-1397:
-------------------------------------
Hot spots would be created if the same data is being processed (and the same
splits). If capacity is not available, the regular fallback technique would
just take a split to the next available node on which capacity is available.
Basing this off active caching would lead to similar behaviour - if always
scheduling to the cached location.
> Node affinity for tasks processing the same splits
> --------------------------------------------------
>
> Key: TEZ-1397
> URL: https://issues.apache.org/jira/browse/TEZ-1397
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
>
> Within a session, if the same set of HDFS blocks are accessed by different
> tasks - these should ideally be launched on the same node for better buffer
> cache, etc utilization.
> This will likely end up being another level of requests higher up than
> NODE_LOCAL for the scheduler.
--
This message was sent by Atlassian JIRA
(v6.2#6252)