[
https://issues.apache.org/jira/browse/TEZ-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Zhang updated TEZ-2456:
----------------------------
Description:
Currently we don't have spec for the recovery event logging. Recovery would be
fragile to code change. This jira try to define the spec and refactor the
recovery event logging to ensure it meet the spec. [~hitesh] Please help review
the following spec I drafted.
*DAG*
* DAGSubmitted/DAGInitializedEvent/DAGStartedEvent must been logged once,
Should not log it again when it’s recovered.
* DAGFinishedEvent may be logged multiple times. ( DAG move from SUCCEEDED
from ERROR ? Should we ignore this ? )
* VertexFinishedEvent should be logged before DAGFinishedEvent
*Vertex*
* RootInputDataInformation must be logged before VertexInitializedEvent
* DataMovement must be logged before TaskFinishedEvent
* TaskFinishedEvent must be logged before VertexFinishedEvent
* VertexInitializedEvent / VertexStartedEvent should only be logged once,
should not log again when it’s recovered.
* VertexFinishedEvent may be logged multiple times. (e.g. Vertex move from
SUCCEEDED to FAILED)
* VertexParallelismUpdatedEvent must be logged before TaskStartedEvent
* TaskFinishedEvent should be logged before VertexFinishedEvent
*Task*
* If there’s no TaskStartedEvent, TaskFinishedEvent may still be logged (e.g.
Task is killed in NEW ) Current’s behavior is that TaskFinishedEvent won’t be
logged if there’s no TaskStartedEvent.
* TaskStartedEvent should only be logged once. Should not log again when it’s
recovered.
* TaskFinishedEvent may be logged multiple times (e.g. Task move from SUCCEEDED
to FAILED)
* TaskAttemptFinishedEvent should be logged before TaskFinishedEvent
*TaskAttempt*
* If there’s no TaskAttemptStartedEvent, TaskAttemptFinishedEvent may still be
logged ( e.g. TaskAttempt is killed in NEW ) Current’s behavior is that
TaskAttemptFinishedEvent won’t be logged if there’s no TaskAttemptStartedEvent
* TaskAttemptStartedEvent should only be logged once. Should not log again
when it’s recovered.
* TaskAttemptFinishedEvent may be logged multiple times. (e.g. TaskAttempt move
from SUCCEEDED to FAILED)
was:
Currently we don't have spec for the recovery event logging. Recovery would be
fragile to code change. This jira try to define the spec and refactor the
recovery event logging to ensure it meet the spec.
*DAG*
* DAGSubmitted/DAGInitializedEvent/DAGStartedEvent must been logged once,
Should not log it again when it’s recovered.
* DAGFinishedEvent may be logged multiple times. ( DAG move from SUCCEEDED
from ERROR ? Should we ignore this ? )
* VertexFinishedEvent should be logged before DAGFinishedEvent
*Vertex*
* RootInputDataInformation must be logged before VertexInitializedEvent
* DataMovement must be logged before TaskFinishedEvent
* TaskFinishedEvent must be logged before VertexFinishedEvent
* VertexInitializedEvent / VertexStartedEvent should only be logged once,
should not log again when it’s recovered.
* VertexFinishedEvent may be logged multiple times. (e.g. Vertex move from
SUCCEEDED to FAILED)
* VertexParallelismUpdatedEvent must be logged before TaskStartedEvent
* TaskFinishedEvent should be logged before VertexFinishedEvent
*Task*
* If there’s no TaskStartedEvent, TaskFinishedEvent may still be logged (e.g.
Task is killed in NEW ) Current’s behavior is that TaskFinishedEvent won’t be
logged if there’s no TaskStartedEvent.
* TaskStartedEvent should only be logged once. Should not log again when it’s
recovered.
* TaskFinishedEvent may be logged multiple times (e.g. Task move from SUCCEEDED
to FAILED)
* TaskAttemptFinishedEvent should be logged before TaskFinishedEvent
*TaskAttempt*
* If there’s no TaskAttemptStartedEvent, TaskAttemptFinishedEvent may still be
logged ( e.g. TaskAttempt is killed in NEW ) Current’s behavior is that
TaskAttemptFinishedEvent won’t be logged if there’s no TaskAttemptStartedEvent
* TaskAttemptStartedEvent should only be logged once. Should not log again
when it’s recovered.
* TaskAttemptFinishedEvent may be logged multiple times. (e.g. TaskAttempt move
from SUCCEEDED to FAILED)
> Refactor recovery event logging to ensure it meet the recovery event spec
> -------------------------------------------------------------------------
>
> Key: TEZ-2456
> URL: https://issues.apache.org/jira/browse/TEZ-2456
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
>
> Currently we don't have spec for the recovery event logging. Recovery would
> be fragile to code change. This jira try to define the spec and refactor the
> recovery event logging to ensure it meet the spec. [~hitesh] Please help
> review the following spec I drafted.
> *DAG*
> * DAGSubmitted/DAGInitializedEvent/DAGStartedEvent must been logged once,
> Should not log it again when it’s recovered.
> * DAGFinishedEvent may be logged multiple times. ( DAG move from SUCCEEDED
> from ERROR ? Should we ignore this ? )
> * VertexFinishedEvent should be logged before DAGFinishedEvent
> *Vertex*
> * RootInputDataInformation must be logged before VertexInitializedEvent
> * DataMovement must be logged before TaskFinishedEvent
> * TaskFinishedEvent must be logged before VertexFinishedEvent
> * VertexInitializedEvent / VertexStartedEvent should only be logged once,
> should not log again when it’s recovered.
> * VertexFinishedEvent may be logged multiple times. (e.g. Vertex move from
> SUCCEEDED to FAILED)
> * VertexParallelismUpdatedEvent must be logged before TaskStartedEvent
> * TaskFinishedEvent should be logged before VertexFinishedEvent
> *Task*
> * If there’s no TaskStartedEvent, TaskFinishedEvent may still be logged (e.g.
> Task is killed in NEW ) Current’s behavior is that TaskFinishedEvent won’t
> be logged if there’s no TaskStartedEvent.
> * TaskStartedEvent should only be logged once. Should not log again when
> it’s recovered.
> * TaskFinishedEvent may be logged multiple times (e.g. Task move from
> SUCCEEDED to FAILED)
> * TaskAttemptFinishedEvent should be logged before TaskFinishedEvent
>
> *TaskAttempt*
> * If there’s no TaskAttemptStartedEvent, TaskAttemptFinishedEvent may still
> be logged ( e.g. TaskAttempt is killed in NEW ) Current’s behavior is that
> TaskAttemptFinishedEvent won’t be logged if there’s no TaskAttemptStartedEvent
> * TaskAttemptStartedEvent should only be logged once. Should not log again
> when it’s recovered.
> * TaskAttemptFinishedEvent may be logged multiple times. (e.g. TaskAttempt
> move from SUCCEEDED to FAILED)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)