[jira] [Created] (HIVE-16792) Estimate Rows When Joining BIGINT to INT Column

BELUGA BEHR (JIRA) Tue, 30 May 2017 14:39:23 -0700

BELUGA BEHR created HIVE-16792:
----------------------------------

             Summary: Estimate Rows When Joining BIGINT to INT Column
                 Key: HIVE-16792
                 URL: https://issues.apache.org/jira/browse/HIVE-16792
             Project: Hive
          Issue Type: Improvement
    Affects Versions: 2.1.1
            Reporter: BELUGA BEHR
            Priority: Minor



{code:sql}
create table test1
(a int);

create table test2
(z bigint);

INSERT INTO test1 VALUES (1);
INSERT INTO test2 VALUES (2147483648);

analyze table test1 compute statistics for columns;
analyze table test2 compute statistics for columns;

EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
{code}

{code}
Explain
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-3 depends on stages: Stage-4
  Stage-0 depends on stages: Stage-3
""
STAGE PLANS:
  Stage: Stage-4
    Map Reduce Local Work
      Alias -> Map Local Tables:
        t2 
          Fetch Operator
            limit: -1
      Alias -> Map Local Operator Tree:
        t2 
          TableScan
            alias: t2
            filterExpr: z is not null (type: boolean)
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: COMPLETE
            Filter Operator
              predicate: z is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
stats: COMPLETE
              HashTable Sink Operator
                keys:
                  0 UDFToLong(a) (type: bigint)
                  1 z (type: bigint)

  Stage: Stage-3
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t1
            filterExpr: UDFToLong(a) is not null (type: boolean)
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column 
stats: NONE
            Filter Operator
              predicate: UDFToLong(a) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column 
stats: NONE
              Map Join Operator
                condition map:
                     Inner Join 0 to 1
                keys:
                  0 UDFToLong(a) (type: bigint)
                  1 z (type: bigint)
                outputColumnNames: _col0, _col4"
                Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
Column stats: NONE
                Select Operator
                  expressions: _col0 (type: int), _col4 (type: bigint)"
                  outputColumnNames: _col0, _col1"
                  Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
Column stats: NONE
                  File Output Operator
                    compressed: false
                    Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.TextInputFormat
                        output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
      Local Work:
        Map Reduce Local Work

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

I would expect that perhaps Hive would be smart enough to know that this join 
is not going to produce any rows because the MIN VALUE of table test2 is more 
than INTEGER.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HIVE-16792) Estimate Rows When Joining BIGINT to INT Column

Reply via email to