Syed Shameerur Rahman created TEZ-4140:
------------------------------------------
Summary: TEZ Recovery: Discrepancy In Scheduling Vertices During
Vertex Recovery
Key: TEZ-4140
URL: https://issues.apache.org/jira/browse/TEZ-4140
Project: Apache Tez
Issue Type: Bug
Affects Versions: 0.9.2, 0.9.1, 0.8.4, 0.9.0, 0.8.2
Reporter: Syed Shameerur Rahman
Assignee: Syed Shameerur Rahman
Fix For: 0.10.0, 0.9.3
Attachments: DAG.png
*Issue*:
During vertex recovery, the initialization stage of vertex is skipped if
1) VertexInputInitializerEvent
2) VertexReconfigureDoneEvent
are seen in the recovery data. Further the initialization stage is skipped by
replacing any VertexManagerPlugin (Eg: ShuffleVertexManager,
CustomVertexManager etc) by NoOpVertexManager. There are couple of issues in
replacing VertexManagerPlugin with NoOpVertexManager
1) Completeness of any VertexManagerPlugin is only after the tasks are launched
in that vertex, So using NoOpVertexManager without checking whether tasks for
that particular vertex were launched in previous run might result in some kind
of discrepancy in deciding when and how many tasks should be launched in that
vertex during recovery.
2) Maintaining vertex dependency:
Say for example we have two vertices v1 and v2 and v2 is dependent on v1 (v1
---> v2), and for some reasons if v1 was not able to skip initialization stage
and v2 was able to skip initialization stage and there is a chance that v2
might get scheduled before v1 since NoOpVertexManager is used.
The above mentioned problem is what i have faced. Attached a DAG for reference:
In the DAG, Reducer 7 is dependent on Reducer 6 and for some reason during Tez
Recovery, Reducer 6's initialization stage was not skipped where as Reducer 7's
initialization stage was skipped and NoOpVertexManager was used instead of
ShuffleVertexManager which went on to launch all the tasks in Reducer 7 before
waiting in for Reducer 6's completion. Initially it was decided that Reducer 6
will be launching 14 tasks and as per that information, Tasks launched in
Reducer 7 was waiting for 14 shuffle inputs but later due to AutoReduce
parallelism No. of tasks in Reducer 6 was adjusted to 1 and the Reducer 7's
tasks didn't know about this and kept on waiting for 14 shuffle inputs but in
actual there was only 1, hence the query was stuck. This can also lead to
deadlock when no. of containers are limited and Reducer 7 ends up using all of
them.
*Proposed Solution:*
In addition to the condition of VertexInputInitializerEvent and
VertexReconfigureDoneEvent, introduce couple more conditions:
1) Check whether tasks were launched in the vertex in the previous run before
replacing VertexManagerPlugin with NoOpVertexManager
2) All the parent vertices should have skipped initialization stage before the
child vertex does it. This is required to maintain vertex dependency
--
This message was sent by Atlassian Jira
(v8.3.4#803005)