[ https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szehon Ho updated HIVE-8700: ---------------------------- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Committed to spark. Thanks a lot to Suhas for the fix! > Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch] > ------------------------------------------------------------------------------ > > Key: HIVE-8700 > URL: https://issues.apache.org/jira/browse/HIVE-8700 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Suhas Satish > Fix For: spark-branch > > Attachments: HIVE-8700-spark.patch, HIVE-8700.2-spark.patch, > HIVE-8700.3-spark.patch, HIVE-8700.patch > > > With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small > tables. For example, the follow represents the operator plan for the small > table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from > dec join dec1 on dec.value=dec1.d;{code} > {code} > Map 2 > Map Operator Tree: > TableScan > alias: dec1 > Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: d is not null (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > Reduce Output Operator > key expressions: d (type: decimal(5,2)) > sort order: + > Map-reduce partition columns: d (type: decimal(5,2)) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > value expressions: i (type: int) > {code} > With the new design for broadcasting small tables, we need to convert the > ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)