Ben Sidhom created SPARK-26268:
----------------------------------

             Summary: Decouple shuffle data from Spark deployment
                 Key: SPARK-26268
                 URL: https://issues.apache.org/jira/browse/SPARK-26268
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle
    Affects Versions: 2.4.0
            Reporter: Ben Sidhom


Right now the batch scheduler assumes that shuffle data is tied to executors. 
As a result, when an executor is lost, any map tasks that ran on that executor 
are rescheduled unless the "external" shuffle service is being used. Note that 
this service is only external in the sense that it does not live within 
executors themselves; its implementation cannot be swapped out and it is 
assumed to speak the BlockManager language.

The following changes would facilitate external shuffle (see SPARK-25299 for 
motivation):
 * Do not rerun map tasks on lost executors when shuffle data is stored 
externally. For example, this could be determined by a property or by an 
additional method that all ShuffleManagers implement.
 * Do not assume that shuffle data is stored in the standard BlockManager 
format or that a BlockManager is or must be available to ShuffleManagers.

Note that only the first change is actually required to realize the benefits of 
remote shuffle implementations as a phony (or null) BlockManager can be used by 
shuffle implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to