Rohini Palaniswamy commented on TEZ-394:

Above scheduling would keep performance good and never make it go worse than 
mapreduce in any case. But in case there is more capacity available then that 
can be utilized for better performance by running other low priority root tasks 
in parallel. For eg: With the above case, even though RootV2 is at priority 3) 
if there is free capacity while RootV1 is being run, then tasks of RootV2 
should also be run at the same time. If IntermediateV1 tasks are ready to be 
launched (slow start satisfied), then they should be given priority over RootV2 
(Preemption can be added later if necessary).

> Better scheduling for uneven DAGs
> ---------------------------------
>                 Key: TEZ-394
>                 URL: https://issues.apache.org/jira/browse/TEZ-394
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Rohini Palaniswamy
>   Consider a series of joins or group by on dataset A with few datasets that 
> takes 10 hours followed by a final join with a dataset X. The vertex that 
> loads dataset X will be one of the top vertexes and initialized early even 
> though its output is not consumed till the end after 10 hours. 
> 1) Could either use delayed start logic for better resource allocation
> 2) Else if they are started upfront, need to handle failure/recovery cases 
> where the nodes which executed the MapTask might have gone down when the 
> final join happens. 

This message was sent by Atlassian JIRA

Reply via email to