[ 
https://issues.apache.org/jira/browse/SPARK-33747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-33747:
--------------------------------
    Fix Version/s:     (was: 3.0.1)
                       (was: 2.4.5)

> Avoid calling unregisterMapOutput when the map stage is being rerunning.
> ------------------------------------------------------------------------
>
>                 Key: SPARK-33747
>                 URL: https://issues.apache.org/jira/browse/SPARK-33747
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager
>    Affects Versions: 2.4.5, 3.0.1
>            Reporter: weixiuli
>            Priority: Major
>
> When a fetch failure happened, DAGScheduler will try to unregister the 
> corresponding map output. The current logic has a race condition that the new 
> map stage attempt is running while the current reduce stage attempt returns 
> another fetch failure (note: the current reduce stage firstly returns a fetch 
> failure to make the maps stage is rerunning, and then the rerunning map stage 
> may return some mapstatus of the failed MapId before the current reduce stage 
> returns another fetch failure at the same MapId, the current reduce is last 
> attempt due to the new map stage is not yet completed). In this case, if the 
> map output is always unregistered, it may actually unregister the map output 
> from the new map stage attempt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to