[jira] [Updated] (SPARK-45182) Ignore task completion from old stage after retrying indeterminate stages

Mayur Bhosale (Jira) Fri, 15 Sep 2023 11:43:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-45182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mayur Bhosale updated SPARK-45182:
----------------------------------
    Description: 
SPARK-25342 Added a support for rolling back shuffle map stage so that all 
tasks of the stage can be retried when the stage output is indeterminate. This 
is done by clearing all map outputs at the time of stage submission. This 
approach workouts well except for this case:

Assume both Shuffle 1 and 2 are indeterminate

ShuffleMapStage1 –{-}–{-}> Shuffle 1 ---–> ShuffleMapStage2 ----> Shuffle 2 
----> ResultStage
 * ShuffleMapStage1 is complete
 * A task from ShuffleMapStage2 fails with FetchFailed. Other tasks are still 
running
 * Both ShuffleMapStage1 and ShuffleMapStage2 are retried
 * ShuffleMapStage1 is retried and completes
 * ShuffleMapStage2 reattempt is scheduled for execution
 * Before all tasks of ShuffleMapStage2 reattempt could finish, one/more 
laggard tasks from the original attempt of ShuffleMapStage2 finish and 
ShuffleMapStage2 also gets marked as complete
 * Result Stage gets scheduled and finishes

Internally within Uber, we have been using the stage rollback functionality 
even for deterministic stages from Spark 2.4.3 to add fault tolerance from 
server going down in [remote shuffle service 
|https://github.com/uber/RemoteShuffleService]and have faced this scenario 
quite often

Ideally, such laggard tasks should not be considered towards the partition 
completion.

  was:
SPARK-25342 Added a support for rolling back shuffle map stage so that all 
tasks of the stage can be retried when the stage output is indeterministic. 
This is done by clearing all map outputs at the time of stage submission. This 
approach workouts well except for this case:

Assume both Shuffle 1 and 2 are indeterministic

ShuffleMapStage1 –-–-> Shuffle 1 ---–> ShuffleMapStage2 ----> Shuffle 2 ----> 
ResultStage
 * ShuffleMapStage1 is complete
 * A task from ShuffleMapStage2 fails with FetchFailed. Other tasks are still 
running
 * Both ShuffleMapStage1 and ShuffleMapStage2 are retried
 * ShuffleMapStage1 is retried and completes
 * ShuffleMapStage2 reattempt is scheduled for execution
 * Before all tasks of ShuffleMapStage2 reattempt could finish, one/more 
laggard tasks from the original attempt of ShuffleMapStage2 finish and 
ShuffleMapStage2 also gets marked as complete
 * Result Stage gets scheduled and finishes

Internally within Uber, we have been using the stage rollback functionality 
even for deterministic stages from Spark 2.4.3 to add fault tolerance from 
server going down in [remote shuffle service 
|https://github.com/uber/RemoteShuffleService]and have faced this scenario 
quite often

Ideally, such laggard tasks should not be considered towards the partition 
completion.

        Summary: Ignore task completion from old stage after retrying 
indeterminate stages  (was: Ignore task completion from old stage after 
retrying indeterministic stages)

> Ignore task completion from old stage after retrying indeterminate stages
> -------------------------------------------------------------------------
>
>                 Key: SPARK-45182
>                 URL: https://issues.apache.org/jira/browse/SPARK-45182
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.2
>            Reporter: Mayur Bhosale
>            Priority: Minor
>
> SPARK-25342 Added a support for rolling back shuffle map stage so that all 
> tasks of the stage can be retried when the stage output is indeterminate. 
> This is done by clearing all map outputs at the time of stage submission. 
> This approach workouts well except for this case:
> Assume both Shuffle 1 and 2 are indeterminate
> ShuffleMapStage1 –{-}–{-}> Shuffle 1 ---–> ShuffleMapStage2 ----> Shuffle 2 
> ----> ResultStage
>  * ShuffleMapStage1 is complete
>  * A task from ShuffleMapStage2 fails with FetchFailed. Other tasks are still 
> running
>  * Both ShuffleMapStage1 and ShuffleMapStage2 are retried
>  * ShuffleMapStage1 is retried and completes
>  * ShuffleMapStage2 reattempt is scheduled for execution
>  * Before all tasks of ShuffleMapStage2 reattempt could finish, one/more 
> laggard tasks from the original attempt of ShuffleMapStage2 finish and 
> ShuffleMapStage2 also gets marked as complete
>  * Result Stage gets scheduled and finishes
> Internally within Uber, we have been using the stage rollback functionality 
> even for deterministic stages from Spark 2.4.3 to add fault tolerance from 
> server going down in [remote shuffle service 
> |https://github.com/uber/RemoteShuffleService]and have faced this scenario 
> quite often
> Ideally, such laggard tasks should not be considered towards the partition 
> completion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-45182) Ignore task completion from old stage after retrying indeterminate stages

Reply via email to