Kuhu Shukla created TEZ-4027:
--------------------------------
Summary: DagAwareYarnTaskScheduler can miscompute blocked vertices
and cause a hang
Key: TEZ-4027
URL: https://issues.apache.org/jira/browse/TEZ-4027
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.9.1, 0.10.0
Reporter: Kuhu Shukla
Assignee: Kuhu Shukla
In a scenario where there are retro active failures and the YARN queue is full
to not allow more new container assignments, the scheduler can miscompute
blocked vertex set as it tries to flip the bits upto the length of the bitset
which may not be reflective of the total number of vertices. This causes no
preemption and the DAG will hang.
{code}
@GuardedBy("DagAwareYarnTaskScheduler.this")
BitSet createVertexBlockedSet() {
BitSet blocked = new BitSet();
Entry<Priority, RequestPriorityStats> entry = priorityStats.lastEntry();
if (entry != null) {
RequestPriorityStats stats = entry.getValue();
blocked.or(stats.allowedVertices);
blocked.flip(0, blocked.length());
blocked.or(stats.descendants);
}
return blocked;
}
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)