[ https://issues.apache.org/jira/browse/SPARK-6169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348995#comment-14348995 ]
Mridul Muralidharan commented on SPARK-6169: -------------------------------------------- I have not done too much research into how it can be improved - but the join code we have is fairly old, and predates shuffle improvements made. Even naive tagging of tuples followed by group or combine by key has been seen to be much more efficient than current use of external append only map when sort based shuffle is on. I am sure the folks working on spark sql would have much more experience improving joins ! > Shuffle based join > ------------------ > > Key: SPARK-6169 > URL: https://issues.apache.org/jira/browse/SPARK-6169 > Project: Spark > Issue Type: Improvement > Components: Shuffle > Reporter: Mridul Muralidharan > Priority: Minor > > Leverage improved spark shuffle to do the join more efficiently - the current > impl using cogroup can be significantly improved. > I wont be able to work/contribute on this unfortunately, would be great if > someone else can pick this up - wanted to ensure this task is not missed out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org