[jira] [Updated] (DRILL-2568) New partition pruning prevents the optimization for trivial COUNT(*) queries

Aman Sinha (JIRA) Wed, 25 Mar 2015 19:10:25 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Aman Sinha updated DRILL-2568:
------------------------------
    Attachment: 0001-DRILL-2568-Drop-filter-plan-node-if-all-conjuncts-ha.patch

Drop the filter plan node if all conjuncts in the filter have been pushed as 
part of partition pruning, except for the situation where the new set of files 
is empty - we add a single file in that case, so the Filter is preserved in 
that case. 

I am running the regression tests but submitting for review.  [~jnadeau] can 
you pls review ?

> New partition pruning prevents the optimization for trivial COUNT(*) queries
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-2568
>                 URL: https://issues.apache.org/jira/browse/DRILL-2568
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>         Attachments: 
> 0001-DRILL-2568-Drop-filter-plan-node-if-all-conjuncts-ha.patch
>
>
> With the new interpreter based partition pruning,  if the query has only 
> partition filters and they are pushed into the Scan, we don't drop the Filter 
> node from the plan. This prevents the optimization for COUNT(*) queries 
> against parquet files where we read the count values directly from the 
> parquet files instead of scanning and aggregating.  The 
> ConvertCountToDirectScan rule does not get applied if there is an intervening 
> Filter between the Scan and the Aggregate nodes.  
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from 
> dfs.`/Users/asinha/data/multilevel/parquet` where dir0=1995;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 00-02        Project($f0=[0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 1995)])
> 00-05              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q1/orders_95_q1.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q2/orders_95_q2.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q3/orders_95_q3.parquet],
>  ReadEntryWithPath 
> [path=file:/Users/asinha/data/multilevel/parquet/1995/Q4/orders_95_q4.parquet]],
>  selectionRoot=/Users/asinha/data/multilevel/parquet, numFiles=4, 
> columns=[`dir0`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-2568) New partition pruning prevents the optimization for trivial COUNT(*) queries

Reply via email to