[ https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201198#comment-14201198 ]
Suhas Satish commented on HIVE-8700: ------------------------------------ Have a patch which now generates the HashTableSinkOperators as follows. Will be uploading a patch soon. explain select table1.key, table2.value, table3.value from table1 join table2 on table1.key=table2.key join table3 on table1.key=table3.key; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Spark Edges: Map 3 <- Map 1 (NONE, 0), Map 2 (NONE, 0) DagName: ssatish_20141106152828_299c0f54-40a8-4cf5-91f4-ecb1f420955f:1 Vertices: Map 1 Map Operator Tree: TableScan alias: table1 Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 727 Data size: 2908 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} 2 {value} keys: 0 key (type: int) 1 key (type: int) 2 key (type: int) Map 2 Map Operator Tree: TableScan alias: table3 Statistics: Num rows: 2 Data size: 216 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 1 Data size: 108 Basic stats: COMPLETE Column stats: NONE HashTable Sink Operator condition expressions: 0 {key} 1 {value} 2 {value} keys: 0 key (type: int) 1 key (type: int) 2 key (type: int) Map 3 Map Operator Tree: TableScan alias: table2 Statistics: Num rows: 55 Data size: 5791 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 28 Data size: 2948 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 Inner Join 0 to 2 condition expressions: 0 {key} 1 {value} 2 {value} keys: 0 key (type: int) 1 key (type: int) 2 key (type: int) outputColumnNames: _col0, _col6, _col11 input vertices: 0 Map 1 2 Map 2 Statistics: Num rows: 1599 Data size: 6397 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), _col6 (type: string), _col11 (type: string) outputColumnNames: _col0, _col1, _col2 Statistics: Num rows: 1599 Data size: 6397 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1599 Data size: 6397 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink > Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch] > ------------------------------------------------------------------------------ > > Key: HIVE-8700 > URL: https://issues.apache.org/jira/browse/HIVE-8700 > Project: Hive > Issue Type: Sub-task > Components: Spark > Reporter: Xuefu Zhang > Assignee: Suhas Satish > Attachments: HIVE-8700-spark.patch, HIVE-8700.patch > > > With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small > tables. For example, the follow represents the operator plan for the small > table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from > dec join dec1 on dec.value=dec1.d;{code} > {code} > Map 2 > Map Operator Tree: > TableScan > alias: dec1 > Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL > Column stats: NONE > Filter Operator > predicate: d is not null (type: boolean) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > Reduce Output Operator > key expressions: d (type: decimal(5,2)) > sort order: + > Map-reduce partition columns: d (type: decimal(5,2)) > Statistics: Num rows: 0 Data size: 0 Basic stats: NONE > Column stats: NONE > value expressions: i (type: int) > {code} > With the new design for broadcasting small tables, we need to convert the > ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)