[
https://issues.apache.org/jira/browse/SPARK-27630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
dzcxzl updated SPARK-27630:
---------------------------
Description:
In the case of stage retry, the {{taskEnd}} event from the zombie stage
sometimes makes the number of {{totalRunningTasks}} negative, which will causes
the job to get stuck.
Similar problem also exists with {{stageIdToTaskIndices}} &
{{stageIdToSpeculativeTaskIndices}}.
If it is a failed {{taskEnd}} event of the zombie stage, this will cause
{{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the
task index of the active stage, and the number of {{totalPendingTasks}} will
increase unexpectedly.
was:In the case of stage retry, the onTaskEnd event may be sent after the new
stage is submitted. This will cause the ExecutorAllocationManager to calculate
that the currently running task is negative.
> Stage retry causes totalRunningTasks calculation to be negative
> ---------------------------------------------------------------
>
> Key: SPARK-27630
> URL: https://issues.apache.org/jira/browse/SPARK-27630
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.0
> Reporter: dzcxzl
> Priority: Minor
>
> In the case of stage retry, the {{taskEnd}} event from the zombie stage
> sometimes makes the number of {{totalRunningTasks}} negative, which will
> causes the job to get stuck.
> Similar problem also exists with {{stageIdToTaskIndices}} &
> {{stageIdToSpeculativeTaskIndices}}.
> If it is a failed {{taskEnd}} event of the zombie stage, this will cause
> {{stageIdToTaskIndices}} or {{stageIdToSpeculativeTaskIndices}} to remove the
> task index of the active stage, and the number of {{totalPendingTasks}} will
> increase unexpectedly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]