[ 
https://issues.apache.org/jira/browse/SPARK-37300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hujiahua updated SPARK-37300:
-----------------------------
    Description: 
`TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and 
`handleFailedTask` , but in some case the task was already finished state, 
which we should ignore task finished event.

Case describe: 
when a executor finished a task of some stage, the driver will receive a 
StatusUpdate event to handle it. At the same time the driver found the executor 
heartbeat timed out, so the dirver also need handle ExecutorLost event 
simultaneously. There was a race condition issues here, which will make 
TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. More 
detailed description and discussion can be viewed at 
https://issues.apache.org/jira/browse/SPARK-36575 and 
https://github.com/apache/spark/pull/33872

  was:
`TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and 
`handleFailedTask` , but in some case the task may task was already finished 
state, so we should ignore task finished event in this case.

Case describe: 
when a executor finished a task of some stage, the driver will receive a 
StatusUpdate event to handle it. At the same time the driver found the executor 
heartbeat timed out, so the dirver also need handle ExecutorLost event 
simultaneously. There was a race condition issues here, which will make 
TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. More 
detailed description and discussion can be viewed at 
https://issues.apache.org/jira/browse/SPARK-36575 and 
https://github.com/apache/spark/pull/33872


> TaskSchedulerImpl should ignore task finished event if its task was already 
> finished state
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-37300
>                 URL: https://issues.apache.org/jira/browse/SPARK-37300
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.2.0
>            Reporter: hujiahua
>            Priority: Major
>
> `TaskSchedulerImpl` handle task finished event at `handleSuccessfulTask` and 
> `handleFailedTask` , but in some case the task was already finished state, 
> which we should ignore task finished event.
> Case describe: 
> when a executor finished a task of some stage, the driver will receive a 
> StatusUpdate event to handle it. At the same time the driver found the 
> executor heartbeat timed out, so the dirver also need handle ExecutorLost 
> event simultaneously. There was a race condition issues here, which will make 
> TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. 
> More detailed description and discussion can be viewed at 
> https://issues.apache.org/jira/browse/SPARK-36575 and 
> https://github.com/apache/spark/pull/33872



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to