[ 
https://issues.apache.org/jira/browse/SPARK-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348995#comment-14348995
 ] 

Mridul Muralidharan commented on SPARK-6169:
--------------------------------------------

I have not done too much research into how it can be improved  - but the join 
code we have is fairly old, and predates shuffle improvements made.
Even naive tagging of tuples followed by group or combine by key has been seen 
to be much more efficient than current use of external append only map when 
sort based shuffle is on. I am sure the folks working on spark sql would have 
much more experience improving joins !

> Shuffle based join
> ------------------
>
>                 Key: SPARK-6169
>                 URL: https://issues.apache.org/jira/browse/SPARK-6169
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>            Reporter: Mridul Muralidharan
>            Priority: Minor
>
> Leverage improved spark shuffle to do the join more efficiently - the current 
> impl using cogroup can be significantly improved.
> I wont be able to work/contribute on this unfortunately, would be great if 
> someone else can pick this up - wanted to ensure this task is not missed out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to