[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

Brock Noland (JIRA) Thu, 21 Aug 2014 17:46:51 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106253#comment-14106253
 ]


Brock Noland commented on HIVE-7384:
------------------------------------

1) I noticed recently that latest Hive, when there are more than one reducers, 
does a total order sort:
https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java#L374

2)Should we do some investigation into Tez auto-parallelism (HIVE-7158)? Let me 
know your thoughts.

> Research into reduce-side join [Spark Branch]
> ---------------------------------------------
>
>                 Key: HIVE-7384
>                 URL: https://issues.apache.org/jira/browse/HIVE-7384
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>         Attachments: Hive on Spark Reduce Side Join.docx, sales_items.txt, 
> sales_products.txt, sales_stores.txt
>
>
> Hive's join operator is very sophisticated, especially for reduce-side join. 
> While we expect that other types of join, such as map-side join and SMB 
> map-side join, will work out of the box with our design, there may be some 
> complication in reduce-side join, which extensively utilizes key tag and 
> shuffle behavior. Our design principle prefers to making Hive implementation 
> work out of box also, which might requires new functionality from Spark. The 
> tasks is to research into this area, identifying requirements for Spark 
> community and the work to be done on Hive to make reduce-side join work.
> A design doc might be needed for this. For more information, please refer to 
> the overall design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-7384) Research into reduce-side join [Spark Branch]

Reply via email to