[
https://issues.apache.org/jira/browse/TEZ-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16093956#comment-16093956
]
Siddharth Seth commented on TEZ-3770:
-------------------------------------
bq. It tries to schedule new containers for tasks that match its priority
before trying to schedule the highest priority task first. This avoids hanging
onto unused, lower priority containers because higher priority requests are
pending (see TEZ-3535).
If I'm reading the code right. New containers which cannot be assigned
immediately are released? Pending requests are removed as soon as a container
is assigned. YARN will not end up allocating this unused container again (other
than the regular timing races on the protocol).
bq. New task allocation requests are first matched against idle containers
before requesting resources from the RM. This cuts down on AM-RM protocol churn.
Not sure if priority is being considered while doing this. i.e. is it possible
there's a pending higher priority request which has not yet been allocated to
an idle container (primarily races in timing)? Think this is handled since an
attempt is made to allocate a container the moment the task assigned to it is
de-allocated.
bq. Task requests for tasks that are DAG-descendants of pending task requests
will not be allocated to help reduce priority inversions that could lead to
preemption.
This is broken for newly assigned containers?
On the patch itself.
DagAwareYarnTaskScheduler
- TaskRequest oldRequest = requests.put -> Is it possible for old to not be
null? A single request to allocate a single attempt.
- incrVertexTaskCount - lowerStat.allowedVertices.andNot(d); <- Would be nice
to have some more documentation or an example of how this ends up working. Does
it rely on the way priorities are assigned?, the kind of topological sort? When
reading this, it seems to block off a large chunk of requests at a lower
priority.
- Different code paths for the allocation of a delayed container and when a new
task request comes in. Assuming this is a result of attempting to not place a
YARN request if a container can be assigned immediately? Not sure if more
re-use is possible across the various assign methods.
- RequestPriorityStats - javadoc on descendants is a little confusing. Mentions
a single vertex. I think this gets set for every vertex at the same priority
level. The default out of box behaviour will always generate different vertices
at different priority levels at the moment. The old behaviour was to generate
the same priority if distance from root was the same. Is moving back to the old
behaviour an option - given descendent information is now known).
- Didn't go into enough details to figure out if an attempt is made to run
through an entire tree before moving over to an unrelated tree
- In tryAssignReuseContainer - if a container cannot be assigned immediately,
will it be released? Should this decision be based on headroom / pending
requests (headroom is very often incorrect, preemption is meant to take care of
that). e.g. a task failure, so there's a new request. If the container cannot
be re-used for this request, and capacity is available in YARN - it may make
sense to hold on to the container.
DagInfo - Should getVertexDescendants be exposed as a method, or just the
Vertices and the relationship between them. Whoever wants to use this can set
up their own representation. The bit representation could be a helper. The
vertex relationship can likely be used for more than just the list of
descendants.
TaskSchedulerContext - Instead of exposing a getVertexIndexForTask(Object) - I
think a better option is to provide an interface for the requesting task
itself. (TaskRequest instead of Object). That can expose relevant information,
instead of making an additional call to get this from TaskSchedulerContext.
> DAG-aware YARN task scheduler
> -----------------------------
>
> Key: TEZ-3770
> URL: https://issues.apache.org/jira/browse/TEZ-3770
> Project: Apache Tez
> Issue Type: New Feature
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: TEZ-3770.001.patch
>
>
> There are cases where priority alone does not convey the relationship between
> tasks, and this can cause problems when scheduling or preempting tasks. If
> the YARN task scheduler was aware of the relationship between tasks then it
> could make smarter decisions when trying to assign tasks to containers or
> preempt running tasks to schedule pending tasks.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)