[
https://issues.apache.org/jira/browse/SPARK-44179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated SPARK-44179:
-----------------------------------
Labels: pull-request-available (was: )
> When a task failed and the inferred task for that task is still executing,
> the number of dynamically scheduled executors will be calculated incorrectly
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-44179
> URL: https://issues.apache.org/jira/browse/SPARK-44179
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.4.1
> Reporter: liangyongyuan
> Priority: Major
> Labels: pull-request-available
>
> Assuming a stage has Task 1, with Task 1.0 and a speculative task Task 1.1
> running concurrently, the dynamic scheduler calculates the number of
> executors as 2 (pendingTask: 0, pendingSpeculative: 0, running: 2).
> At this point, Task 1.0 fails, and the dynamic scheduler recalculates the
> number of executors as 2 (pendingTask: 1, pendingSpeculative: 0, running: 1).
> Due to the failure of Task 1.0, copyRunning(1) becomes 1. As a result, Task 1
> is speculated again and a SparkListenerSpeculativeTaskSubmitted event is
> triggered. However, the dynamic scheduler's calculation for the number of
> executors becomes 3 (pendingTask: 1, pendingSpeculative: 1, running: 1),
> which is obviously not as expected.
> Then, Task 1.2 starts, and it is marked as a speculative task. However, the
> dynamic scheduler still calculates the number of executors as 3 (pendingTask:
> 1, pendingSpeculative: 1, running: 1), which again is not as expected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]