[jira] [Comment Edited] (DRILL-1500) Partition filtering might lead to an unnecessary column in the result set.

Aman Sinha (JIRA) Sat, 10 Jan 2015 19:49:14 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272794#comment-14272794
 ]


Aman Sinha edited comment on DRILL-1500 at 1/11/15 3:47 AM:
------------------------------------------------------------

This is not related to partition filters.  I can repro the extra column even 
with the query below which does not involve partition columns.  I believe this 
has to do with how we handle the '*' column in the planning and/or execution 
phase.  In planning phase, if a Project above Scan is producing both * and 
another column, we should remove that column since the * column subsumes all 
columns. In execution phase, there's a known issue with duplicates (DRILL-1778) 
which seems related to this. 

{code}
// Wrong number of columns
 jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` where n_nationkey 
< 2 order by n_regionkey;
+-------------+------------+-------------+------------+--------------+
| n_nationkey |   n_name   | n_regionkey | n_comment  | n_regionkey0 |
+-------------+------------+-------------+------------+--------------+
| 0           | ALGERIA    | 0           |  haggle. carefully final deposits 
detect slyly agai | 0            |
| 1           | ARGENTINA  | 1           | al foxes promise slyly according to 
the regular accounts. bold requests alon | 1            |
+-------------+------------+-------------+------------+--------------+
{code}



was (Author: amansinha100):
This is not related to partition filters.  I can repro the extra column even 
with the query below which does not involve partition columns.  I believe this 
has to do with how we handle the '*' column in the planning and/or execution 
phase.  In planning phase, if a Project above Scan is producing both * and 
another column, we should remove that column since the * column subsumes all 
columns. In execution phase, there's a known issue with duplicates (DRILL-1778) 
which seems related to this. 

{sql}
// Wrong number of columns
 jdbc:drill:zk=local> select * from cp.`tpch/nation.parquet` where n_nationkey 
< 2 order by n_regionkey;
+-------------+------------+-------------+------------+--------------+
| n_nationkey |   n_name   | n_regionkey | n_comment  | n_regionkey0 |
+-------------+------------+-------------+------------+--------------+
| 0           | ALGERIA    | 0           |  haggle. carefully final deposits 
detect slyly agai | 0            |
| 1           | ARGENTINA  | 1           | al foxes promise slyly according to 
the regular accounts. bold requests alon | 1            |
+-------------+------------+-------------+------------+--------------+
{sql}


> Partition filtering might lead to an unnecessary column in the result set. 
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1500
>                 URL: https://issues.apache.org/jira/browse/DRILL-1500
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.8.0
>
>
> When partition filtering is used together with select * query, Drill might 
> return the partitioning column duplicately. 
> Q1 : 
> {code}
> select * from 
> dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
>  where dir0=1994 and dir1='Q1' order by dir0 limit 1;
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> |   dir00    |    dir0    |    dir1    |  o_clerk   | o_comment  | o_custkey  
> | o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority 
> | o_totalprice |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994       | 1994       | Q1         | Clerk#000000743 | y pending requests 
> integrate | 1292       | 1994-01-20  | 66         | 5-LOW           | F       
>       | 0              | 104190.66    |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (2.097 seconds)
> {code}
> We can see that column "dir0" appeared twice in the result set.  In 
> comparison, here is the query without partition filtering and the query 
> result:
> Q2:
> {code}
> select * from 
> dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
>  order by dir0 limit 1;
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> |    dir0    |    dir1    |  o_clerk   | o_comment  | o_custkey  | 
> o_orderdate | o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | 
> o_totalprice |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994       | Q1         | Clerk#000000743 | y pending requests integrate | 
> 1292       | 1994-01-20  | 66         | 5-LOW           | F             | 0   
>            | 104190.66    |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (0.761 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (DRILL-1500) Partition filtering might lead to an unnecessary column in the result set.

Reply via email to