[
https://issues.apache.org/jira/browse/TEZ-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Turner Eagles resolved TEZ-2456.
-----------------------------------------
Resolution: Won't Fix
Closing recovery v1 bugs/features.
> Refactor recovery event logging to ensure it meet the recovery event spec
> -------------------------------------------------------------------------
>
> Key: TEZ-2456
> URL: https://issues.apache.org/jira/browse/TEZ-2456
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Priority: Major
> Labels: Recovery
>
> Currently we don't have spec for the recovery event logging. Recovery would
> be fragile to code change. This jira try to define the spec and refactor the
> recovery event logging to ensure it meet the spec. [~hitesh] Please help
> review the following spec I drafted.
> *DAG*
> * DAGSubmitted/DAGInitializedEvent/DAGStartedEvent must been logged once,
> Should not log it again when it’s recovered.
> * DAGFinishedEvent may be logged multiple times. ( DAG move from SUCCEEDED
> from ERROR ? Should we ignore this ? )
> * VertexFinishedEvent should be logged before DAGFinishedEvent
> *Vertex*
> * RootInputDataInformation must be logged before VertexInitializedEvent
> * DataMovement must be logged before TaskFinishedEvent
> * TaskFinishedEvent must be logged before VertexFinishedEvent
> * VertexInitializedEvent / VertexStartedEvent should only be logged once,
> should not log again when it’s recovered.
> * VertexFinishedEvent may be logged multiple times. (e.g. Vertex move from
> SUCCEEDED to FAILED)
> * VertexParallelismUpdatedEvent must be logged before TaskStartedEvent
> * For VertexFinishedEvent (SUCCEEDED), before it there must be at least n
> TaskFinishedEvent (SUCCEEDED)
> *Task*
> * If there’s no TaskStartedEvent, TaskFinishedEvent may still be logged (e.g.
> Task is killed in NEW ) Current’s behavior is that TaskFinishedEvent won’t
> be logged if there’s no TaskStartedEvent.
> * TaskStartedEvent should only be logged once. Should not log again when
> it’s recovered.
> * TaskFinishedEvent may be logged multiple times (e.g. Task move from
> SUCCEEDED to FAILED)
> * For TaskFinishedEvent (SUCCEEDED), before it there must be at least one
> TaskAttemptFinishedEvent (SUCCEEDED)
>
> *TaskAttempt*
> * If there’s no TaskAttemptStartedEvent, TaskAttemptFinishedEvent may still
> be logged ( e.g. TaskAttempt is killed in NEW ) Current’s behavior is that
> TaskAttemptFinishedEvent won’t be logged if there’s no TaskAttemptStartedEvent
> * TaskAttemptStartedEvent should only be logged once. Should not log again
> when it’s recovered.
> * TaskAttemptFinishedEvent may be logged multiple times. (e.g. TaskAttempt
> move from SUCCEEDED to FAILED)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)