BELUGA BEHR created HIVE-16792:
----------------------------------
Summary: Estimate Rows When Joining BIGINT to INT Column
Key: HIVE-16792
URL: https://issues.apache.org/jira/browse/HIVE-16792
Project: Hive
Issue Type: Improvement
Affects Versions: 2.1.1
Reporter: BELUGA BEHR
Priority: Minor
{code:sql}
create table test1
(a int);
create table test2
(z bigint);
INSERT INTO test1 VALUES (1);
INSERT INTO test2 VALUES (2147483648);
analyze table test1 compute statistics for columns;
analyze table test2 compute statistics for columns;
EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
{code}
{code}
Explain
STAGE DEPENDENCIES:
Stage-4 is a root stage
Stage-3 depends on stages: Stage-4
Stage-0 depends on stages: Stage-3
""
STAGE PLANS:
Stage: Stage-4
Map Reduce Local Work
Alias -> Map Local Tables:
t2
Fetch Operator
limit: -1
Alias -> Map Local Operator Tree:
t2
TableScan
alias: t2
filterExpr: z is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
stats: COMPLETE
Filter Operator
predicate: z is not null (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
stats: COMPLETE
HashTable Sink Operator
keys:
0 UDFToLong(a) (type: bigint)
1 z (type: bigint)
Stage: Stage-3
Map Reduce
Map Operator Tree:
TableScan
alias: t1
filterExpr: UDFToLong(a) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column
stats: NONE
Filter Operator
predicate: UDFToLong(a) is not null (type: boolean)
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column
stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 UDFToLong(a) (type: bigint)
1 z (type: bigint)
outputColumnNames: _col0, _col4"
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col4 (type: bigint)"
outputColumnNames: _col0, _col1"
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Local Work:
Map Reduce Local Work
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
I would expect that perhaps Hive would be smart enough to know that this join
is not going to produce any rows because the MIN VALUE of table test2 is more
than INTEGER.MAX_VALUE.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)