[ 
https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14019590#comment-14019590
 ] 

Matei Zaharia commented on SPARK-2044:
--------------------------------------

Saisai, regarding the pluggable implementation -- if you would like to do it 
based on this doc, be my guest. I see there are a few API differences in your 
version (e.g. I want to be able to request a range of reduce keys, and pass an 
Ordering and an Aggregator to the shuffle). The other issue I ran into is that 
I want to hide the MapOutputTracker behind the ShuffleManager, which I think 
you aren't doing right now. This requires changing DAGScheduler a bit in how it 
interacts with the tracker. The reason is that we found keeping track about a 
lot of info for each map (in particular the size of its output for each reduce) 
is expensive, and it might be nice to abstract this and try different versions 
of it (e.g. one where reduce tasks query the size from the node they want to 
fetch from).

I've pushed my work in progress (still incomplete) to 
https://github.com/mateiz/spark/tree/pluggable-shuffle/core/src/main/scala/org/apache/spark/shuffle.

Raymond, regarding the BlockManager, we haven't thought much about the 
interface there. We want to implement sort-based shuffle using the current one 
if possible but it would be good to hear ideas. Basically there are two things 
you want -- to write in a block / file (one issue Yahoo brought up is that 
they'd like these to be bigger than 2 GB) and to fetch a *range* of a block 
remotely (which we sort of hard-code for our current consolidation approach).

> Pluggable interface for shuffles
> --------------------------------
>
>                 Key: SPARK-2044
>                 URL: https://issues.apache.org/jira/browse/SPARK-2044
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle, Spark Core
>            Reporter: Matei Zaharia
>            Assignee: Matei Zaharia
>         Attachments: Pluggableshuffleproposal.pdf
>
>
> Given that a lot of the current activity in Spark Core is in shuffles, I 
> wanted to propose factoring out shuffle implementations in a way that will 
> make experimentation easier. Ideally we will converge on one implementation, 
> but for a while, this could also be used to have several implementations 
> coexist. I'm suggesting this because I aware of at least three efforts to 
> look at shuffle (from Yahoo!, Intel and Databricks). Some of the things 
> people are investigating are:
> * Push-based shuffle where data moves directly from mappers to reducers
> * Sorting-based instead of hash-based shuffle, to create fewer files (helps a 
> lot with file handles and memory usage on large shuffles)
> * External spilling within a key
> * Changing the level of parallelism or even algorithm for downstream stages 
> at runtime based on statistics of the map output (this is a thing we had 
> prototyped in the Shark research project but never merged in core)
> I've attached a design doc with a proposed interface. It's not too crazy 
> because the interface between shuffles and the rest of the code is already 
> pretty narrow (just some iterators for reading data and a writer interface 
> for writing it). Bigger changes will be needed in the interaction with 
> DAGScheduler and BlockManager for some of the ideas above, but we can handle 
> those separately, and this interface will allow us to experiment with some 
> short-term stuff sooner.
> If things go well I'd also like to send a sort-based shuffle implementation 
> for 1.1, but we'll see how the timing on that works out.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to