[
https://issues.apache.org/jira/browse/TEZ-4027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723615#comment-16723615
]
Kuhu Shukla commented on TEZ-4027:
----------------------------------
[~jlowe], [~jeagles] request for comments/review. Thanks a lot!
> DagAwareYarnTaskScheduler can miscompute blocked vertices and cause a hang
> --------------------------------------------------------------------------
>
> Key: TEZ-4027
> URL: https://issues.apache.org/jira/browse/TEZ-4027
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.9.1, 0.10.0
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Priority: Major
> Attachments: TEZ-4027.001.patch, TEZ-4027.002.patch
>
>
> In a scenario where there are retro active failures and the YARN queue is
> full to not allow more new container assignments, the scheduler can
> miscompute blocked vertex set as it tries to flip the bits upto the length of
> the bitset which may not be reflective of the total number of vertices. This
> causes no preemption and the DAG will hang.
> {code}
> @GuardedBy("DagAwareYarnTaskScheduler.this")
> BitSet createVertexBlockedSet() {
> BitSet blocked = new BitSet();
> Entry<Priority, RequestPriorityStats> entry = priorityStats.lastEntry();
> if (entry != null) {
> RequestPriorityStats stats = entry.getValue();
> blocked.or(stats.allowedVertices);
> blocked.flip(0, blocked.length());
> blocked.or(stats.descendants);
> }
> return blocked;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)