Kuhu Shukla created TEZ-4027:
--------------------------------

             Summary: DagAwareYarnTaskScheduler can miscompute blocked vertices 
and cause a hang
                 Key: TEZ-4027
                 URL: https://issues.apache.org/jira/browse/TEZ-4027
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.9.1, 0.10.0
            Reporter: Kuhu Shukla
            Assignee: Kuhu Shukla


In a scenario where there are retro active failures and the YARN queue is full 
to not allow more new container assignments, the scheduler can miscompute 
blocked vertex set as it tries to flip the bits upto the length of the bitset 
which may not be reflective of the total number of vertices. This causes no 
preemption and the DAG will hang.

{code}
@GuardedBy("DagAwareYarnTaskScheduler.this")
    BitSet createVertexBlockedSet() {
      BitSet blocked = new BitSet();
      Entry<Priority, RequestPriorityStats> entry = priorityStats.lastEntry();
      if (entry != null) {
        RequestPriorityStats stats = entry.getValue();
        blocked.or(stats.allowedVertices);
        blocked.flip(0, blocked.length());
        blocked.or(stats.descendants);
      }
      return blocked;
    }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to