[jira] [Commented] (HIVE-8700) Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

Szehon Ho (JIRA) Mon, 03 Nov 2014 15:54:07 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195393#comment-14195393
 ]


Szehon Ho commented on HIVE-8700:
---------------------------------

Hi [~ssatish], can you share what you were planning?  

I was chatting with [~csun], he mentioned that SparkMapJoinResolver may be able 
to handle it (modifying the codes of MapJoinResolver).  Do you think it will be 
a better idea if we do it in there in the common logic, as then all the input 
resolvers (like SMB, skew, and hinted mapjoin) can also take advantage of it?  
Thanks.



> Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-8700
>                 URL: https://issues.apache.org/jira/browse/HIVE-8700
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Suhas Satish
>
> With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small 
> tables. For example, the follow represents the operator plan for the small 
> table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from 
> dec join dec1 on dec.value=dec1.d;{code}
> {code}
>         Map 2 
>             Map Operator Tree:
>                 TableScan
>                   alias: dec1
>                   Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL 
> Column stats: NONE
>                   Filter Operator
>                     predicate: d is not null (type: boolean)
>                     Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: d (type: decimal(5,2))
>                       sort order: +
>                       Map-reduce partition columns: d (type: decimal(5,2))
>                       Statistics: Num rows: 0 Data size: 0 Basic stats: NONE 
> Column stats: NONE
>                       value expressions: i (type: int)
> {code}
> With the new design for broadcasting small tables, we need to convert the 
> ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8700) Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]

Reply via email to