[ 
https://issues.apache.org/jira/browse/TEZ-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Turner Eagles resolved TEZ-2456.
-----------------------------------------
    Resolution: Won't Fix

Closing recovery v1 bugs/features.

> Refactor recovery event logging to ensure it meet the recovery event spec
> -------------------------------------------------------------------------
>
>                 Key: TEZ-2456
>                 URL: https://issues.apache.org/jira/browse/TEZ-2456
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Major
>              Labels: Recovery
>
> Currently we don't have spec for the recovery event logging. Recovery would 
> be fragile to code change. This jira try to define the spec and refactor the 
> recovery event logging to ensure it meet the spec. [~hitesh] Please help 
> review the following spec I drafted.
> *DAG*
> * DAGSubmitted/DAGInitializedEvent/DAGStartedEvent must been logged once, 
> Should not log it again when it’s recovered.
> * DAGFinishedEvent may be logged multiple times.  ( DAG move from SUCCEEDED 
> from ERROR ? Should we ignore this ? )
> * VertexFinishedEvent should be logged before DAGFinishedEvent
> *Vertex* 
> * RootInputDataInformation must be logged before VertexInitializedEvent
> * DataMovement must be logged before TaskFinishedEvent
> * TaskFinishedEvent must be logged before VertexFinishedEvent
> * VertexInitializedEvent / VertexStartedEvent should only be logged once, 
> should not log again when it’s recovered.
> * VertexFinishedEvent may be logged multiple times. (e.g. Vertex move from 
> SUCCEEDED to FAILED)
> * VertexParallelismUpdatedEvent must be logged before TaskStartedEvent
> * For VertexFinishedEvent (SUCCEEDED), before it there must be at least n 
> TaskFinishedEvent (SUCCEEDED)
> *Task*
> * If there’s no TaskStartedEvent, TaskFinishedEvent may still be logged (e.g. 
> Task is killed in NEW )  Current’s behavior is that TaskFinishedEvent won’t 
> be logged if there’s no TaskStartedEvent. 
> * TaskStartedEvent should only be logged once.  Should not log again when 
> it’s recovered.
> * TaskFinishedEvent may be logged multiple times (e.g. Task move from 
> SUCCEEDED to FAILED)
> * For TaskFinishedEvent (SUCCEEDED), before it there must be at least one 
> TaskAttemptFinishedEvent (SUCCEEDED)
>       
> *TaskAttempt*
> * If there’s no TaskAttemptStartedEvent, TaskAttemptFinishedEvent may still 
> be logged ( e.g. TaskAttempt is killed in NEW )  Current’s behavior is that 
> TaskAttemptFinishedEvent won’t be logged if there’s no TaskAttemptStartedEvent
> * TaskAttemptStartedEvent should only be logged once.  Should not log again 
> when it’s recovered.
> * TaskAttemptFinishedEvent may be logged multiple times. (e.g. TaskAttempt 
> move from SUCCEEDED to FAILED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to