[ 
https://issues.apache.org/jira/browse/TEZ-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724150#comment-16724150
 ] 

Jason Lowe commented on TEZ-4027:
---------------------------------

Thanks for the patch!  +1 lgtm.  Committing this.

> DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang
> --------------------------------------------------------------------------
>
>                 Key: TEZ-4027
>                 URL: https://issues.apache.org/jira/browse/TEZ-4027
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10.0
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Major
>         Attachments: TEZ-4027.001.patch, TEZ-4027.002.patch
>
>
> In a scenario where there are retro active failures and the YARN queue is 
> full to not allow more new container assignments, the scheduler can 
> miscompute blocked vertex set as it tries to flip the bits upto the length of 
> the bitset which may not be reflective of the total number of vertices. This 
> causes no preemption and the DAG will hang.
> {code}
> @GuardedBy("DagAwareYarnTaskScheduler.this")
>     BitSet createVertexBlockedSet() {
>       BitSet blocked = new BitSet();
>       Entry<Priority, RequestPriorityStats> entry = priorityStats.lastEntry();
>       if (entry != null) {
>         RequestPriorityStats stats = entry.getValue();
>         blocked.or(stats.allowedVertices);
>         blocked.flip(0, blocked.length());
>         blocked.or(stats.descendants);
>       }
>       return blocked;
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to