Jason Lowe created TEZ-3491:
-------------------------------
Summary: Tez job can hang due to container priority inversion
Key: TEZ-3491
URL: https://issues.apache.org/jira/browse/TEZ-3491
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.7.1
Reporter: Jason Lowe
Priority: Critical
If the Tez AM receives containers at a lower priority than the highest priority
task being requested then it fails to assign the container to any task. In
addition if the container is new then it refuses to release it if there are any
pending tasks. If it takes too long for the higher priority requests to be
fulfilled (e.g.: the lower priority containers are filling the queue) then
eventually YARN will expire the unused lower priority containers since they
were never launched. The Tez AM then never re-requests these lower priority
containers and the job hangs because the AM is waiting for containers from the
RM that the RM already sent and expired.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)