[ 
https://issues.apache.org/jira/browse/DRILL-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mehant Baid updated DRILL-3121:
-------------------------------
    Issue Type: Improvement  (was: Bug)

> Hive partition pruning is not happening
> ---------------------------------------
>
>                 Key: DRILL-3121
>                 URL: https://issues.apache.org/jira/browse/DRILL-3121
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Flow
>    Affects Versions: 1.0.0
>            Reporter: Hao Zhu
>            Assignee: Mehant Baid
>             Fix For: 1.2.0
>
>
> Tested on 1.0.0 with below commit id, and hive 0.13.
> {code}
> >  select * from sys.version;
> +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> |                 commit_id                 |                           
> commit_message                           |        commit_time         | 
> build_email  |         build_time         |
> +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> | d8b19759657698581cc0d01d7038797952888123  | DRILL-3100: 
> TestImpersonationDisabledWithMiniDFS fails on Windows  | 15.05.2015 @ 
> 01:18:03 EDT  | Unknown      | 15.05.2015 @ 03:07:10 EDT  |
> +-------------------------------------------+--------------------------------------------------------------------+----------------------------+--------------+----------------------------+
> 1 row selected (0.083 seconds)
> {code}
> How to reproduce:
> 1. Use hive to create below partition table:
> {code}
> CREATE TABLE partition_table(id INT, username string)
>  PARTITIONED BY(year STRING, month STRING)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";
> insert into table partition_table PARTITION(year='2014',month='11') select 
> 1,'u' from passwords limit 1;
> insert into table partition_table PARTITION(year='2014',month='12') select 
> 2,'s' from passwords limit 1;
> insert into table partition_table PARTITION(year='2015',month='01') select 
> 3,'e' from passwords limit 1;
> insert into table partition_table PARTITION(year='2015',month='02') select 
> 4,'r' from passwords limit 1;
> insert into table partition_table PARTITION(year='2015',month='03') select 
> 5,'n' from passwords limit 1;
> {code}
> 2. Hive query can do partition pruning for below 2 queries:
> {code}
> hive>  explain EXTENDED select * from partition_table where year='2015' and 
> month in ( '02','03') ;
>             partition values:
>               month 02
>               year 2015
>             partition values:
>               month 03
>               year 2015              
> explain EXTENDED select * from partition_table where year='2015' and (month 
> >= '02' and month <= '03') ;
>             partition values:
>               month 02
>               year 2015
>             partition values:
>               month 03
>               year 2015
> {code}
> Hive only scans 2 partitions -- 2015/02 and 2015/03.
> 3. Drill can not do partition pruning for below 2 queries:
> {code}
> > explain plan for select * from hive.partition_table where `year`='2015' and 
> > `month` in ('02','03');
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[AND(=($2, '2015'), OR(=($3, '02'), =($3, 
> '03')))])
> 00-04            Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:partition_table), 
> inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4,
>  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, 
> maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], 
> columns=[`*`], partitions= [Partition(values:[2015, 01]), 
> Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
> > explain plan for select * from hive.partition_table where `year`='2015' and 
> > (`month` >= '02' and `month` <= '03' );
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Filter(condition=[AND(=($2, '2015'), >=($3, '02'), <=($3, 
> '03'))])
> 00-04            Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:partition_table), 
> inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=01/000000_0:0+4,
>  maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4, 
> maprfs:/user/hive/warehouse/partition_table/year=2015/month=03/000000_0:0+4], 
> columns=[`*`], partitions= [Partition(values:[2015, 01]), 
> Partition(values:[2015, 02]), Partition(values:[2015, 03])]]])
> {code}
> Drill scans 3 partitions -- 2015/01, 2015/02 and 2015/03.
> Note: if the inlist only has 1 value, Drill can do partition pruning well:
> {code}
> >  explain plan for select * from hive.partition_table where `year`='2015' 
> > and `month` in ('02');
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(id=[$0], username=[$1], year=[$2], month=[$3])
> 00-02        Scan(groupscan=[HiveScan [table=Table(dbName:default, 
> tableName:partition_table), 
> inputSplits=[maprfs:/user/hive/warehouse/partition_table/year=2015/month=02/000000_0:0+4],
>  columns=[`*`], partitions= [Partition(values:[2015, 02])]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to