[jira] [Commented] (TEZ-3770) DAG-aware YARN task scheduler

Siddharth Seth (JIRA) Wed, 19 Jul 2017 16:33:04 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093956#comment-16093956
 ]


Siddharth Seth commented on TEZ-3770:
-------------------------------------

bq. It tries to schedule new containers for tasks that match its priority 
before trying to schedule the highest priority task first. This avoids hanging 
onto unused, lower priority containers because higher priority requests are 
pending (see TEZ-3535).
If I'm reading the code right. New containers which cannot be assigned 
immediately are released? Pending requests are removed as soon as a container 
is assigned. YARN will not end up allocating this unused container again (other 
than the regular timing races on the protocol).

bq. New task allocation requests are first matched against idle containers 
before requesting resources from the RM. This cuts down on AM-RM protocol churn.
Not sure if priority is being considered while doing this. i.e. is it possible 
there's a pending higher priority request which has not yet been allocated to 
an idle container (primarily races in timing)? Think this is handled since an 
attempt is made to allocate a container the moment the task assigned to it is 
de-allocated.

bq. Task requests for tasks that are DAG-descendants of pending task requests 
will not be allocated to help reduce priority inversions that could lead to 
preemption.
This is broken for newly assigned containers?

On the patch itself.
DagAwareYarnTaskScheduler
- TaskRequest oldRequest = requests.put -> Is it possible for old to not be 
null? A single request to allocate a single attempt.
- incrVertexTaskCount - lowerStat.allowedVertices.andNot(d); <- Would be nice 
to have some more documentation or an example of how this ends up working. Does 
it rely on the way priorities are assigned?, the kind of topological sort? When 
reading this, it seems to block off a large chunk of requests at a lower 
priority.
- Different code paths for the allocation of a delayed container and when a new 
task request comes in. Assuming this is a result of attempting to not place a 
YARN request if a container can be assigned immediately? Not sure if more 
re-use is possible across the various assign methods.
- RequestPriorityStats - javadoc on descendants is a little confusing. Mentions 
a single vertex. I think this gets set for every vertex at the same priority 
level. The default out of box behaviour will always generate different vertices 
at different priority levels at the moment. The old behaviour was to generate 
the same priority if distance from root was the same. Is moving back to the old 
behaviour an option - given descendent information is now known).
- Didn't go into enough details to figure out if an attempt is made to run 
through an entire tree before moving over to an unrelated tree
- In tryAssignReuseContainer - if a container cannot be assigned immediately, 
will it be released? Should this decision be based on headroom / pending 
requests (headroom is very often incorrect, preemption is meant to take care of 
that). e.g. a task failure, so there's a new request. If the container cannot 
be re-used for this request, and capacity is available in YARN - it may make 
sense to hold on to the container.

DagInfo - Should getVertexDescendants be exposed as a method, or just the 
Vertices and the relationship between them. Whoever wants to use this can set 
up their own representation. The bit representation could be a helper. The 
vertex relationship can likely be used for more than just the list of 
descendants.
TaskSchedulerContext - Instead of exposing a getVertexIndexForTask(Object) - I 
think a better option is to provide an interface for the requesting task 
itself. (TaskRequest instead of Object). That can expose relevant information, 
instead of making an additional call to get this from TaskSchedulerContext. 


> DAG-aware YARN task scheduler
> -----------------------------
>
>                 Key: TEZ-3770
>                 URL: https://issues.apache.org/jira/browse/TEZ-3770
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: TEZ-3770.001.patch
>
>
> There are cases where priority alone does not convey the relationship between 
> tasks, and this can cause problems when scheduling or preempting tasks.  If 
> the YARN task scheduler was aware of the relationship between tasks then it 
> could make smarter decisions when trying to assign tasks to containers or 
> preempt running tasks to schedule pending tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TEZ-3770) DAG-aware YARN task scheduler

Reply via email to