[ https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106253#comment-14106253 ]
Brock Noland commented on HIVE-7384: ------------------------------------ 1) I noticed recently that latest Hive, when there are more than one reducers, does a total order sort: https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java#L374 2)Should we do some investigation into Tez auto-parallelism (HIVE-7158)? Let me know your thoughts. > Research into reduce-side join [Spark Branch] > --------------------------------------------- > > Key: HIVE-7384 > URL: https://issues.apache.org/jira/browse/HIVE-7384 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Szehon Ho > Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, > sales_products.txt, sales_stores.txt > > > Hive's join operator is very sophisticated, especially for reduce-side join. > While we expect that other types of join, such as map-side join and SMB > map-side join, will work out of the box with our design, there may be some > complication in reduce-side join, which extensively utilizes key tag and > shuffle behavior. Our design principle prefers to making Hive implementation > work out of box also, which might requires new functionality from Spark. The > tasks is to research into this area, identifying requirements for Spark > community and the work to be done on Hive to make reduce-side join work. > A design doc might be needed for this. For more information, please refer to > the overall design doc on wiki. -- This message was sent by Atlassian JIRA (v6.2#6252)