[ 
https://issues.apache.org/jira/browse/TEZ-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3270:
-------------------------
    Description: 
One of the scheduling factors is data locality. For a given completed task A, 
it is better to have its depending tasks in B run on the same host/container to 
reduce the network data transfer between the two. In addition, it might be 
better to pick larger partition task over smaller partition task. For example, 
in the above fair routing diagram, after task A1 has completed, task B1
and/or task B2 can be scheduled on the same host/container as task A1; and B2 
has higher priority than B1 given P2 is larger than P1.

  was:
The scheduling considers the following factors:

* Destination tasks’ dependency on source tasks defined by the routing policy.
* Data locality.
In the regular scatter­gather routing policy, each destination task depends on 
all source tasks. If
slowstart is configured to be less than 1.0, destination tasks can be started 
as long as a portion
of destination tasks have completed and can fetch data from all those completed 
source tasks.
In fair routing, a destination task might depend on only a subset of source 
tasks thus there is no
point of scheduling a destination task if none of the source tasks it depends 
on have completed.


> Scheduling policy in fair routing
> ---------------------------------
>
>                 Key: TEZ-3270
>                 URL: https://issues.apache.org/jira/browse/TEZ-3270
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>
> One of the scheduling factors is data locality. For a given completed task A, 
> it is better to have its depending tasks in B run on the same host/container 
> to reduce the network data transfer between the two. In addition, it might be 
> better to pick larger partition task over smaller partition task. For 
> example, in the above fair routing diagram, after task A1 has completed, task 
> B1
> and/or task B2 can be scheduled on the same host/container as task A1; and B2 
> has higher priority than B1 given P2 is larger than P1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to