[jira] [Commented] (TEZ-4140) TEZ Recovery: Discrepancy In Scheduling Vertices During Vertex Recovery

Syed Shameerur Rahman (Jira) Wed, 08 Apr 2020 05:32:56 -0700


    [ 
https://issues.apache.org/jira/browse/TEZ-4140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078237#comment-17078237
 ]


Syed Shameerur Rahman commented on TEZ-4140:
--------------------------------------------

[~jeagles] [~bikassaha] [~jlowe] [~zjffdu] Can you please review?

> TEZ Recovery: Discrepancy In Scheduling Vertices During Vertex Recovery
> -----------------------------------------------------------------------
>
>                 Key: TEZ-4140
>                 URL: https://issues.apache.org/jira/browse/TEZ-4140
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.8.2, 0.9.0, 0.8.4, 0.9.1, 0.9.2
>            Reporter: Syed Shameerur Rahman
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>             Fix For: 0.10.0, 0.9.3
>
>         Attachments: DAG.png, TEZ-4140.01.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Issue*:
> During vertex recovery, the initialization stage of vertex is skipped if
> 1) VertexInputInitializerEvent
> 2) VertexReconfigureDoneEvent
> are seen in the recovery data. Further the initialization stage is skipped by 
> replacing any VertexManagerPlugin (Eg: ShuffleVertexManager, 
> CustomVertexManager etc) by NoOpVertexManager. There are couple of issues in 
> replacing VertexManagerPlugin with NoOpVertexManager
> 1) Completeness of any VertexManagerPlugin is only after the tasks are 
> launched in that vertex, So using NoOpVertexManager without checking whether 
> tasks for that particular vertex were launched in previous run might result 
> in some kind of discrepancy in deciding when and how many tasks should be 
> launched in that vertex during recovery.
> 2) Maintaining vertex dependency:
> Say for example we have two vertices v1 and v2 and v2 is dependent on v1 (v1 
> ---> v2), and for some reasons if v1 was not able to skip initialization 
> stage and v2 was able to skip initialization stage and there is a chance that 
> v2 might get scheduled before v1 since NoOpVertexManager is used.
> The above mentioned problem is what i have faced. Attached a DAG for 
> reference:
>  !DAG.png! 
> In the DAG, Reducer 7 is dependent on Reducer 6 and for some reason during 
> Tez Recovery, Reducer 6's initialization stage was not skipped where as 
> Reducer 7's initialization stage was skipped and NoOpVertexManager was used 
> instead of ShuffleVertexManager which went on to launch all the tasks in 
> Reducer 7 before waiting in for Reducer 6's completion. Initially it was 
> decided that Reducer 6 will be launching 14 tasks and as per that 
> information, Tasks launched in Reducer 7 was waiting for 14 shuffle inputs 
> but later due to AutoReduce parallelism No. of tasks in Reducer 6 was 
> adjusted to 1 and the Reducer 7's tasks didn't know about this and kept on 
> waiting for 14 shuffle inputs but in actual there was only 1, hence the query 
> was stuck. This can also lead to deadlock when no. of containers are limited 
> and Reducer 7 ends up using all of them.
> *Proposed Solution:*
> In addition to the condition of VertexInputInitializerEvent and 
> VertexReconfigureDoneEvent, introduce couple more conditions:
> 1) Check whether tasks were launched in the vertex in the previous run before 
> replacing VertexManagerPlugin with NoOpVertexManager
> 2) All the parent vertices should have skipped initialization stage before 
> the child vertex does it. This is required to maintain vertex dependency



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TEZ-4140) TEZ Recovery: Discrepancy In Scheduling Vertices During Vertex Recovery

Reply via email to