[
https://issues.apache.org/jira/browse/SPARK-33747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuming Wang updated SPARK-33747:
--------------------------------
Fix Version/s: (was: 3.0.1)
(was: 2.4.5)
> Avoid calling unregisterMapOutput when the map stage is being rerunning.
> ------------------------------------------------------------------------
>
> Key: SPARK-33747
> URL: https://issues.apache.org/jira/browse/SPARK-33747
> Project: Spark
> Issue Type: Bug
> Components: Block Manager
> Affects Versions: 2.4.5, 3.0.1
> Reporter: weixiuli
> Priority: Major
>
> When a fetch failure happened, DAGScheduler will try to unregister the
> corresponding map output. The current logic has a race condition that the new
> map stage attempt is running while the current reduce stage attempt returns
> another fetch failure (note: the current reduce stage firstly returns a fetch
> failure to make the maps stage is rerunning, and then the rerunning map stage
> may return some mapstatus of the failed MapId before the current reduce stage
> returns another fetch failure at the same MapId, the current reduce is last
> attempt due to the new map stage is not yet completed). In this case, if the
> map output is always unregistered, it may actually unregister the map output
> from the new map stage attempt.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]