liangyongyuan created SPARK-44179:
-------------------------------------

             Summary: When a task failed and the inferred task for that task is 
still executing, the number of dynamically scheduled executors will be 
calculated incorrectly
                 Key: SPARK-44179
                 URL: https://issues.apache.org/jira/browse/SPARK-44179
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 3.4.1
            Reporter: liangyongyuan


Assuming a stage has Task 1, with Task 1.0 and a speculative task Task 1.1 
running concurrently, the dynamic scheduler calculates the number of executors 
as 2 (pendingTask: 0, pendingSpeculative: 0, running: 2).

At this point, Task 1.0 fails, and the dynamic scheduler recalculates the 
number of executors as 2 (pendingTask: 1, pendingSpeculative: 0, running: 1).

Due to the failure of Task 1.0, copyRunning(1) becomes 1. As a result, Task 1 
is speculated again and a SparkListenerSpeculativeTaskSubmitted event is 
triggered. However, the dynamic scheduler's calculation for the number of 
executors becomes 3 (pendingTask: 1, pendingSpeculative: 1, running: 1), which 
is obviously not as expected.

Then, Task 1.2 starts, and it is marked as a speculative task. However, the 
dynamic scheduler still calculates the number of executors as 3 (pendingTask: 
1, pendingSpeculative: 1, running: 1), which again is not as expected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to