[ 
https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274265#comment-14274265
 ] 

Jinfeng Ni commented on DRILL-1500:
-----------------------------------

+1.

The patch looks good to me.

The issue seems to be introduced in the flatten operator work. I'm wondering 
how we could prevent such ProjectPrel replacement in any future PrelVisitor. 
One idea is to make the ProjectAllowDupPrel constructor private, and only 
publicly expose copy() method. Also, add a public static method to explicitly 
create new instance of this special type of ProjectPrel.  This might help 
prevent similar issue happening in the future. 



 

> Partition filtering might lead to an unnecessary column in the result set. 
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1500
>                 URL: https://issues.apache.org/jira/browse/DRILL-1500
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: 
> 0001-DRILL-1500-Partial-fix-Don-t-overwrite-top-level-Pro.patch
>
>
> When partition filtering is used together with select * query, Drill might 
> return the partitioning column duplicately. 
> Q1 : 
> {code}
> select * from 
> dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
>  where dir0=1994 and dir1='Q1' order by dir0 limit 1;
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> |   dir00    |    dir0    |    dir1    |  o_clerk   | o_comment  | o_custkey  
> | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority 
> | o_totalprice |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994       | 1994       | Q1         | Clerk#000000743 | y pending requests 
> integrate | 1292       | 1994-01-20  | 66         | 5-LOW           | F       
>       | 0              | 104190.66    |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (2.097 seconds)
> {code}
> We can see that column "dir0" appeared twice in the result set.  In 
> comparison, here is the query without partition filtering and the query 
> result:
> Q2:
> {code}
> select * from 
> dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
>  order by dir0 limit 1;
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> |    dir0    |    dir1    |  o_clerk   | o_comment  | o_custkey  | 
> o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | 
> o_totalprice |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994       | Q1         | Clerk#000000743 | y pending requests integrate | 
> 1292       | 1994-01-20  | 66         | 5-LOW           | F             | 0   
>            | 104190.66    |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (0.761 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to