Jason Lowe created TEZ-3935:
-------------------------------
Summary: DAG aware scheduler should release unassigned new
containers rather than hold them
Key: TEZ-3935
URL: https://issues.apache.org/jira/browse/TEZ-3935
Project: Apache Tez
Issue Type: Bug
Reporter: Jason Lowe
Assignee: Jason Lowe
I saw a case for a very large job with many containers where the DAG aware
scheduler was getting behind on assigning containers. Newly assigned
containers were not finding any matching request, so they were queued for reuse
processing. However it took so long to get through all of the task and
container events that the container allocations expired before the container
was finally assigned and attempted to be launched.
Newly assigned containers are assigned to their matching requests, even if that
violates the DAG priorities, so it should be safe to simply release these if no
tasks could be found to use them. The matching request has either been removed
or already satisified with a reused container. Besides, if we can't find any
tasks to take the newly assigned container then it is very likely we have
plenty of reusable containers already, and keeping more containers just makes
the job a resource hog on the cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)