[ 
https://issues.apache.org/jira/browse/TEZ-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280792#comment-16280792
 ] 

Eric Wohlstadter commented on TEZ-3872:
---------------------------------------

[~gopalv] [~ashutoshc] [~rajesh.balamohan] [~sseth]

I hacked up the code to do essentially what Gopal describes (the hack doesn't 
add a new API but the effect is the same). 

In some informal experiments, it does look like the node locality improves, 
e.g. TPC-DS q69 1TB, went from 781 locality misses to 116 (see attached output 
from Rajesh's tool).

However, I haven't found any examples where the locality has an effect on the 
run-time. I think this is because the OUTPUT_BYTES for the ONE_TO_ONE edges 
I've found in Hive are just too low to make any difference (~ 10s of MBs). 

Here are some possibilities about moving forward:
1. Try the same experiments on TPC-DS 10TB, see if the higher volume causes 
latency from disk access to become more pronounced.
2. Try some experiments on TPC-H. Perhaps ONE_TO_ONE edge play a more important 
role in that workload.
3. Demonstrate locality improvements, with the understanding that changes to 
locality won't immediately improve existing benchmarks. Perhaps because the 
existing design avoided ONE_TO_ONE edges due to locality problems?
If experiments look good, we just open up some design space for future design 
decisions.

Any suggestions on how to proceed?

> OneToOne Edge: Scheduling misses due to released containers
> -----------------------------------------------------------
>
>                 Key: TEZ-3872
>                 URL: https://issues.apache.org/jira/browse/TEZ-3872
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Gopal V
>         Attachments: tpcds_q69_1000_after.txt, tpcds_q69_1000_before.txt
>
>
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/rm/TaskSchedulerManager.java#L477
> That's where it decides between using container or node/racks - it does not 
> record the hosts/racks for the container, the container affinity ignores node 
> affinity fall backs.
> https://github.com/apache/tez/blob/master/tez-dag/src/main/java/org/apache/tez/dag/app/rm/YarnTaskSchedulerService.java#L986
> Inside the YARN task scheduling impl, this only picks up the host if the 
> container is being held at the moment, not if it has been released - this 
> also has no checks for in use containers.
> TaskSchedulerManager can grab  ta.containerNodeId, directly off the attempt 
> information to get the host info as well container info.
> This needs a new allocateTask API which has container, host, rack in the 
> order of scheduling preference.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to