[ 
https://issues.apache.org/jira/browse/HIVE-7384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7384:
------------------------------

    Description: 
Hive's join operator is very sophisticated, especially for reduce-side join. 
While we expect that other types of join, such as map-side join and SMB 
map-side join, will work out of the box with our design, there may be some 
complication in reduce-side join, which extensively utilizes key tag and 
shuffle behavior. Our design principle prefers to making Hive implementation 
work out of box also, which might requires new functionality from Spark. The 
tasks is to research into this area, identifying requirements for Spark 
community and the work to be done on Hive to make reduce-side join work.

A design doc might be needed for this. For more information, please refer to 
the overall design doc on wiki.

  was:
Hive's join operator is very sophisticated, especially for reduce-side join. 
While we expect that other types of join, such as map-side join and SMB 
map-side join, will work out of the box with our design, there may be some 
complication in reduce-side join, which extensively utilizes key tag and 
shuffle behavior. Our design principle prefer to make Hive implementation work 
out of box also, which might requires new functionality from Spark. The tasks 
is to research into this area, identifying requirements for Spark community and 
work to be done on Hive to make reduce-side join work.

A design doc might be needed for this. For more information, please refer to 
the overall design doc on wiki.


> Research into reduce-side join
> ------------------------------
>
>                 Key: HIVE-7384
>                 URL: https://issues.apache.org/jira/browse/HIVE-7384
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> Hive's join operator is very sophisticated, especially for reduce-side join. 
> While we expect that other types of join, such as map-side join and SMB 
> map-side join, will work out of the box with our design, there may be some 
> complication in reduce-side join, which extensively utilizes key tag and 
> shuffle behavior. Our design principle prefers to making Hive implementation 
> work out of box also, which might requires new functionality from Spark. The 
> tasks is to research into this area, identifying requirements for Spark 
> community and the work to be done on Hive to make reduce-side join work.
> A design doc might be needed for this. For more information, please refer to 
> the overall design doc on wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to