[ 
https://issues.apache.org/jira/browse/SPARK-22339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22339:
------------------------------------

    Assignee: Apache Spark

> Push epoch updates to executors on fetch failure to avoid fetch retries for 
> missing executors
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22339
>                 URL: https://issues.apache.org/jira/browse/SPARK-22339
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Juan Rodríguez Hortalá
>            Assignee: Apache Spark
>         Attachments: push_epoch_update-WIP.diff
>
>
> When a task finishes with error due to a fetch error, then DAGScheduler 
> unregisters the shuffle blocks hosted by the serving executor (or even all 
> the executors in the failing host, with external shuffle and 
> spark.files.fetchFailure.unRegisterOutputOnHost enabled) in the shuffle block 
> directory stored by MapOutputTracker, that then increments its epoch as a 
> result. This event is only signaled to the other executors when a new task 
> with a new epoch starts in each executor. This means that other executors 
> reading from the failed executors will retry fetching shuffle blocks from 
> them, even though the driver already knows those executors are lost and those 
> blocks are now unavailable at those locations. This impacts job runtime, 
> specially for long shuffles and executor failures at the end of a stage, when 
> the only pending tasks are shuffle reads. 
> This could be improved by pushing the epoch update to the executors without 
> having to wait for a new task. In the attached patch I sketch a possible 
> solution that sends the updated epoch from the driver to the executors by 
> piggybacking on the executor heartbeat response. ShuffleBlockFetcherIterator, 
> RetryingBlockFetcher and BlockFetchingListener are modified so blocks 
> locations are checked on each fetch retry. This doesn't introduce additional 
> traffic, as MapOutputTrackerWorker.mapStatuses is shared by all tasks running 
> on the same Executor, and the lookup of the new shuffle blocks directory was 
> going to happen anyway when the new epoch is detected during the start of the 
> next task. 
> I would like to know the opinion of the community on this approach. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to