[
https://issues.apache.org/jira/browse/TEZ-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14112543#comment-14112543
]
Siddharth Seth commented on TEZ-1494:
-------------------------------------
We end up initializing a vertex when all of the following are met 1)
initializer is complete, 2) edges are setup, 3) parallelism is not -1. All
three conditions would be valid for Reducer3, so it would end up allowing Map5
(dependent vertex) to start.
We currently have no way of knowing whether a Vertex will change parallelism -
and whether we should block for such an operation. Alternately, we'll have to
end up updating the downstream tasks with the new parallelism information -
which may be a better way to deal with this since parallelism could potentially
change multiple times at a later point.
> DAG hangs waiting for ShuffleManager.getNextInput()
> ---------------------------------------------------
>
> Key: TEZ-1494
> URL: https://issues.apache.org/jira/browse/TEZ-1494
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Labels: performance
> Attachments: TEZ-1494-DAG.dot
>
>
> Attaching the DAG and the stack trace of the hung process.
> Thread 30071: (state = BLOCKED)
> - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
> - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
> line=186 (Interpreted frame)
> -
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
> @bci=42, line=2043 (Interpreted frame)
> - java.util.concurrent.LinkedBlockingQueue.take() @bci=29, line=442
> (Interpreted frame)
> -
> org.apache.tez.runtime.library.shuffle.common.impl.ShuffleManager.getNextInput()
> @bci=67, line=610 (Interpreted frame)
> -
> org.apache.tez.runtime.library.common.readers.UnorderedKVReader.moveToNextInput()
> @bci=26, line=176 (Interpreted frame)
> - org.apache.tez.runtime.library.common.readers.UnorderedKVReader.next()
> @bci=30, line=117 (Interpreted frame)
--
This message was sent by Atlassian JIRA
(v6.2#6252)