[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

Steven Phillips (JIRA) Fri, 26 Jun 2015 19:25:14 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603909#comment-14603909
 ]


Steven Phillips commented on DRILL-3410:
----------------------------------------

This appears to be due to the fact that the FindPartitionConditions class, 
which is the code that walks the expression tree and determines if pruning is 
valid, assumes that the "Binary" operators "OR" and "AND" only have two 
arguments. But you can see from expression in the plan:

{code}
OR(AND(=($1, 1993), >(ITEM($2, 0), 29600)), =($1, 1994), >(ITEM($2, 0), 29700))
{code}

that expression was rewritten with a single OR operator with 3 arguments.

Rewriting the expression with true binary operators seems to fix the problem. I 
will have a patch available shortly.

> Partition Pruning : We are doing a prune when we shouldn't
> ----------------------------------------------------------
>
>                 Key: DRILL-3410
>                 URL: https://issues.apache.org/jira/browse/DRILL-3410
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Rahul Challapalli
>            Assignee: Steven Phillips
>            Priority: Critical
>             Fix For: 1.1.0
>
>
> git.commit.id.abbrev=60bc945
> The below plan does not look right. It should scan all the files based on the 
> filters in the query. Also hive returned more rows than drill
> {code}
> explain plan for select * from `existing_partition_pruning/lineitempart` 
> where (dir0=1993 and columns[0] >29600) or (dir0=1994 or columns[0]>29700);
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T70¦¦*=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[OR(AND(=($1, 1993), >(ITEM($2, 0), 
> 29600)), =($1, 1994), >(ITEM($2, 0), 29700))])
> 00-05              Project(T70¦¦*=[$0], dir0=[$1], columns=[$2])
> 00-06                Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_3.parquet],
>  ReadEntryWithPath 
> [path=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart/0_0_4.parquet]],
>  
> selectionRoot=/drill/testdata/ctas_auto_partition/existing_partition_pruning/lineitempart,
>  numFiles=2, columns=[`*`]]])
>  |
> {code}
> I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3410) Partition Pruning : We are doing a prune when we shouldn't

Reply via email to