[ 
https://issues.apache.org/jira/browse/SPARK-26268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826525#comment-16826525
 ] 

Ben Sidhom commented on SPARK-26268:
------------------------------------

This is related to but different than SPARK-25299. The goal here is simply to 
modify the scheduler to make it possible to plug in different (custom) shuffle 
implementations.

The very problem that already exists is that assumptions about the Spark 
"external shuffle service" (i.e., the one that ships with Spark and must be 
installed on NodeManagers/machines in standalone mode) are baked in.

I've uploaded a PR to [https://github.com/apache/spark/pull/24462]. This change 
allows clients to declare that the configured shuffle service is external to 
the Spark deployment without mandating anything else about its implementation.

> Decouple shuffle data from Spark deployment
> -------------------------------------------
>
>                 Key: SPARK-26268
>                 URL: https://issues.apache.org/jira/browse/SPARK-26268
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>    Affects Versions: 2.4.0
>            Reporter: Ben Sidhom
>            Priority: Major
>
> Right now the batch scheduler assumes that shuffle data is tied to executors. 
> As a result, when an executor is lost, any map tasks that ran on that 
> executor are rescheduled unless the "external" shuffle service is being used. 
> Note that this service is only external in the sense that it does not live 
> within executors themselves; its implementation cannot be swapped out and it 
> is assumed to speak the BlockManager language.
> The following changes would facilitate external shuffle (see SPARK-25299 for 
> motivation):
>  * Do not rerun map tasks on lost executors when shuffle data is stored 
> externally. For example, this could be determined by a property or by an 
> additional method that all ShuffleManagers implement.
>  * Do not assume that shuffle data is stored in the standard BlockManager 
> format or that a BlockManager is or must be available to ShuffleManagers.
> Note that only the first change is actually required to realize the benefits 
> of remote shuffle implementations as a phony (or null) BlockManager can be 
> used by shuffle implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to