[jira] [Commented] (DRILL-5357) Partition pruning information not available in query plan for COUNT aggregate query

Khurram Faraaz (JIRA) Wed, 20 Sep 2017 17:32:20 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16174052#comment-16174052
 ]


Khurram Faraaz commented on DRILL-5357:
---------------------------------------

Verified fix on Drill 1.12.0 commit: aaff1b35b7339fb4e6ab480dd517994ff9f0a5c5

{noformat}
0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prtn_prune_01 PARTITION BY 
(col_state) 
. . . . . . . . . . . . . . > AS 
. . . . . . . . . . . . . . > SELECT CAST(columns[0] AS DATE) col_date, 
. . . . . . . . . . . . . . > CAST(columns[1] AS CHAR(3)) col_state, 
. . . . . . . . . . . . . . > CAST(columns[2] AS INTEGER) col_prime, 
. . . . . . . . . . . . . . > CAST(columns[3] AS VARCHAR(256)) col_varstr, 
. . . . . . . . . . . . . . > CAST(columns[4] AS INTEGER) col_id, 
. . . . . . . . . . . . . . > CAST(columns[5] AS VARCHAR(50)) col_name 
. . . . . . . . . . . . . . > from `partition_prune_data.csv`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1638640                    |
+-----------+----------------------------+
1 row selected (70.986 seconds)

0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
tbl_prtn_prune_01 where col_state = 'CA';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        Scan(groupscan=[files = [/tmp/tbl_prtn_prune_01/0_0_5.parquet], 
numFiles = 1, DynamicPojoRecordReader{records = [[35653]]}])

Another test

0: jdbc:drill:schema=dfs.tmp> explain plan for  select c1 from 
`DRILL_4589/1998/Q3` where c1 > 1000 limit 1;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(c1=[$0])
00-02        SelectionVectorRemover
00-03          Limit(fetch=[1])
00-04            Limit(fetch=[1])
00-05              Filter(condition=[>($0, 1000)])
00-06                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=/tmp/DRILL_4589/1998/Q3/f459.parquet]], 
selectionRoot=/tmp/DRILL_4589/1998/Q3, numFiles=1, usedMetadataFile=true, 
cacheFileRoot=/tmp/DRILL_4589/1998/Q3, columns=[`c1`]]])
{noformat}

> Partition pruning information not available in query plan for COUNT aggregate 
> query
> -----------------------------------------------------------------------------------
>
>                 Key: DRILL-5357
>                 URL: https://issues.apache.org/jira/browse/DRILL-5357
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.10.0
>         Environment: 3 node CentOS cluster
>            Reporter: Khurram Faraaz
>            Assignee: Arina Ielchiieva
>             Fix For: 1.12.0
>
>
> We are not seeing partition pruning information in the query plan for the 
> below, COUNT(*) and COUNT(<col-name>) query 
> Drill 1.10.0-SNAPSHOT
> git commit id: b657d44f
> parquet table has 6 columns
> total number of rows = 1638640
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE tbl_prtn_prune_01 PARTITION BY 
> (col_state) 
> AS 
> SELECT CAST(columns[0] AS DATE) col_date, 
> CAST(columns[1] AS CHAR(3)) col_state, 
> CAST(columns[2] AS INTEGER) col_prime, 
> CAST(columns[3] AS VARCHAR(256)) col_varstr, 
> CAST(columns[4] AS INTEGER) col_id, 
> CAST(columns[5] AS VARCHAR(50)) col_name 
> from `partition_prune_data.csv`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1638640                    |
> +-----------+----------------------------+
> 1 row selected (17.675 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select COUNT(*) from tbl_prtn_prune_01 where 
> col_state = 'CA';
> +---------+
> | EXPR$0  |
> +---------+
> | 35653   |
> +---------+
> 1 row selected (0.471 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@1d4bb67d[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> {noformat}
> And then I did a REFRESH TABLE METADATA on the parquet table
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> refresh table metadata tbl_prtn_prune_01;
> +-------+-------------------------------------------------------------+
> |  ok   |                           summary                           |
> +-------+-------------------------------------------------------------+
> | true  | Successfully updated metadata for table tbl_prtn_prune_01.  |
> +-------+-------------------------------------------------------------+
> 1 row selected (0.321 seconds)
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(col_state) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@2e0f4be9[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(*) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@3fc1f8e7[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select COUNT(col_date) from 
> tbl_prtn_prune_01 where col_state = 'CA';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@7afc851e[columns
>  = null, isStarQuery = false, isSkipQuery = false]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (DRILL-5357) Partition pruning information not available in query plan for COUNT aggregate query

Reply via email to