Xuefu Zhang created HIVE-8700:
---------------------------------
Summary: Replace ReduceSink to HashTableSink (or equi.) for small
tables [Spark Branch]
Key: HIVE-8700
URL: https://issues.apache.org/jira/browse/HIVE-8700
Project: Hive
Issue Type: Sub-task
Components: Spark
Reporter: Xuefu Zhang
Assignee: Szehon Ho
With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small
tables. For example, the follow represents the operator plan for the small
table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from
dec join dec1 on dec.value=dec1.d;{code}
{code}
Map 2
Map Operator Tree:
TableScan
alias: dec1
Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL
Column stats: NONE
Filter Operator
predicate: d is not null (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
Column stats: NONE
Reduce Output Operator
key expressions: d (type: decimal(5,2))
sort order: +
Map-reduce partition columns: d (type: decimal(5,2))
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
Column stats: NONE
value expressions: i (type: int)
{code}
With the new design for broadcasting small tables, we need to convert the
ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)