[
https://issues.apache.org/jira/browse/HIVE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16030343#comment-16030343
]
Pengcheng Xiong commented on HIVE-16792:
----------------------------------------
ccing [~jcamachorodriguez]
> Estimate Rows When Joining BIGINT to INT Column
> -----------------------------------------------
>
> Key: HIVE-16792
> URL: https://issues.apache.org/jira/browse/HIVE-16792
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 2.1.1
> Reporter: BELUGA BEHR
> Priority: Minor
>
> {code:sql}
> create table test1
> (a int);
> create table test2
> (z bigint);
> INSERT INTO test1 VALUES (1);
> INSERT INTO test2 VALUES (2147483648);
> analyze table test1 compute statistics for columns;
> analyze table test2 compute statistics for columns;
> EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
> {code}
> {code}
> Explain
> STAGE DEPENDENCIES:
> Stage-4 is a root stage
> Stage-3 depends on stages: Stage-4
> Stage-0 depends on stages: Stage-3
> ""
> STAGE PLANS:
> Stage: Stage-4
> Map Reduce Local Work
> Alias -> Map Local Tables:
> t2
> Fetch Operator
> limit: -1
> Alias -> Map Local Operator Tree:
> t2
> TableScan
> alias: t2
> filterExpr: z is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column
> stats: COMPLETE
> Filter Operator
> predicate: z is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL
> Column stats: COMPLETE
> HashTable Sink Operator
> keys:
> 0 UDFToLong(a) (type: bigint)
> 1 z (type: bigint)
> Stage: Stage-3
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: t1
> filterExpr: UDFToLong(a) is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column
> stats: NONE
> Filter Operator
> predicate: UDFToLong(a) is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
> Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 UDFToLong(a) (type: bigint)
> 1 z (type: bigint)
> outputColumnNames: _col0, _col4"
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
> Column stats: NONE
> Select Operator
> expressions: _col0 (type: int), _col4 (type: bigint)"
> outputColumnNames: _col0, _col1"
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE
> Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 1 Basic stats:
> COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Local Work:
> Map Reduce Local Work
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> I would expect that perhaps Hive would be smart enough to know that this join
> is not going to produce any rows because the MIN VALUE of table test2 is more
> than INTEGER.MAX_VALUE.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)