[ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10673:
------------------------------
    Description: 
Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 2/3 
of the CPU was spent during sorting/merging.
While this does not work for MR, for other execution engines (such as Tez), it 
is possible to create a reduce-side join that uses unsorted inputs in order to 
eliminate the sorting, which may be faster than a shuffle join. To join on 
unsorted inputs, we can use the hash join algorithm to perform the join in the 
reducer. This will require the small tables in the join to fit in the 
reducer/hash table for this to work.

  was:Reduce-side hash join (using MapJoinOperator), where the Tez inputs to 
the reducer are unsorted.


> Dynamically partitioned hash join for Tez
> -----------------------------------------
>
>                 Key: HIVE-10673
>                 URL: https://issues.apache.org/jira/browse/HIVE-10673
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
> HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch
>
>
> Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 
> 2/3 of the CPU was spent during sorting/merging.
> While this does not work for MR, for other execution engines (such as Tez), 
> it is possible to create a reduce-side join that uses unsorted inputs in 
> order to eliminate the sorting, which may be faster than a shuffle join. To 
> join on unsorted inputs, we can use the hash join algorithm to perform the 
> join in the reducer. This will require the small tables in the join to fit in 
> the reducer/hash table for this to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to