hujiahua created SPARK-37300: -------------------------------- Summary: TaskSchedulerImpl should ignore task finished event if its task was already finished state Key: SPARK-37300 URL: https://issues.apache.org/jira/browse/SPARK-37300 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: hujiahua
When a executor finished a task of some stage, the driver will receive a StatusUpdate event to handle it. At the same time the driver found the executor heartbeat timed out, so the dirver also need handle ExecutorLost event simultaneously. There was a race condition issues here, which will make TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. The problem is that TaskResultGetter.enqueueSuccessfulTask use asynchronous thread to handle successful task, that mean the synchronized lock of TaskSchedulerImpl was released prematurely during midway https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala#L61. So TaskSchedulerImpl may handle executorLost first, then the asynchronous thread will go on to handle successful task. It cause TaskSetManager.successful and TaskSetManager.tasksSuccessful wrong result. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org