[
https://issues.apache.org/jira/browse/TEZ-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kuhu Shukla updated TEZ-3857:
-----------------------------
Attachment: TEZ-3857.002.patch
Thank you for the comments [~jlowe], [~aplusplus]. I have basically moved the
leafVertex check down before the unSucceed and schedulingCausalTA calls and
updated the test a bit to cover this case. This does leave the
task.failedAttempts and vertex.failedTaskAttemptCount inconsistent as they wont
be incremented. We could do that to be safe but they are used for history
events and vertex progress which is not quite that relevant once the DAG state
machine processes the internal error and fails. Let me know if you have more
thoughts on this.
> Tez TaskImpl can throw Invalid state transition for leaf tasks that do Retro
> Active Transition
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-3857
> URL: https://issues.apache.org/jira/browse/TEZ-3857
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Attachments: TEZ-3857.001.patch, TEZ-3857.002.patch
>
>
> {code}
> Invalid event T_ATTEMPT_FAILED on Task task_1234_5678_1_01_000001
> {code}
> The task had more than one running attempts (because of speculative
> execution), while one of them succeeded and the task was marked succeeded,
> the second failed and caused the Task state machine to enter error state
> since the task was in a leaf vertex and does the following:
> {code}
> if (task.leafVertex) {
> LOG.error("Unexpected event for task of leaf vertex " +
> event.getType() + ", taskId: "
> + task.getTaskId());
> task.internalError(event.getType());
> }
> {code}
> This JIRA tracks fixing this invalid state.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)