BELUGA BEHR created HIVE-16792: ---------------------------------- Summary: Estimate Rows When Joining BIGINT to INT Column Key: HIVE-16792 URL: https://issues.apache.org/jira/browse/HIVE-16792 Project: Hive Issue Type: Improvement Affects Versions: 2.1.1 Reporter: BELUGA BEHR Priority: Minor
{code:sql} create table test1 (a int); create table test2 (z bigint); INSERT INTO test1 VALUES (1); INSERT INTO test2 VALUES (2147483648); analyze table test1 compute statistics for columns; analyze table test2 compute statistics for columns; EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z; {code} {code} Explain STAGE DEPENDENCIES: Stage-4 is a root stage Stage-3 depends on stages: Stage-4 Stage-0 depends on stages: Stage-3 "" STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias -> Map Local Tables: t2 Fetch Operator limit: -1 Alias -> Map Local Operator Tree: t2 TableScan alias: t2 filterExpr: z is not null (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE Filter Operator predicate: z is not null (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE HashTable Sink Operator keys: 0 UDFToLong(a) (type: bigint) 1 z (type: bigint) Stage: Stage-3 Map Reduce Map Operator Tree: TableScan alias: t1 filterExpr: UDFToLong(a) is not null (type: boolean) Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: UDFToLong(a) is not null (type: boolean) Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE Map Join Operator condition map: Inner Join 0 to 1 keys: 0 UDFToLong(a) (type: bigint) 1 z (type: bigint) outputColumnNames: _col0, _col4" Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: int), _col4 (type: bigint)" outputColumnNames: _col0, _col1" Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink {code} I would expect that perhaps Hive would be smart enough to know that this join is not going to produce any rows because the MIN VALUE of table test2 is more than INTEGER.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346)