[jira] [Commented] (DRILL-4590) Queries return wrong results when applied to views

Dechang Gu (JIRA) Thu, 07 Apr 2016 08:42:49 -0700

    [ 
https://issues.apache.org/jira/browse/DRILL-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230424#comment-15230424
 ]


Dechang Gu commented on DRILL-4590:
-----------------------------------

This issue can be easily reproduced as follows (using the drill built-in sample 
database):
1. create the views:
0: jdbc:drill:schema=dfs.tpchViews> create or replace view cview as select 
c_custkey, c_name, c_acctbal, c_nationkey from cp.`tpch/customer.parquet`;
+-------+---------------------------------------------------------------+
|  ok   |                            summary                            |
+-------+---------------------------------------------------------------+
| true  | View 'cview' replaced successfully in 'dfs.tpchViews' schema  |
+-------+---------------------------------------------------------------+
1 row selected (0.286 seconds)
0: jdbc:drill:schema=dfs.tpchViews> create or replace view nview as select 
n_nationkey, n_name, n_regionkey from cp.`tpch/nation.parquet`;
+-------+---------------------------------------------------------------+
|  ok   |                            summary                            |
+-------+---------------------------------------------------------------+
| true  | View 'nview' replaced successfully in 'dfs.tpchViews' schema  |
+-------+---------------------------------------------------------------+
1 row selected (0.282 seconds)


2. Check the execution plans for the query on the views, which shows what 
causes the wrong results:
0: jdbc:drill:schema=dfs.tpchViews> explain plan for select count(*) from cview 
c, nview n where c.c_nationkey = n.n_nationkey and 10.0 > (select 
min(c1.c_acctbal) from cview c1 where c1.c_nationkey = n.n_nationkey);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03          Project($f0=[0])
00-04            HashJoin(condition=[=($0, $1)], joinType=[inner])
00-06              Project(c_custkey=[$0])
00-08                HashJoin(condition=[=($1, $2)], joinType=[inner])
00-11                  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`c_custkey`, `c_nationkey`]]])
00-10                  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`n_nationkey`]]])
00-05              Project(c_custkey0=[$0])
00-07                SelectionVectorRemover
00-09                  Filter(condition=[>(10.0, $1)])
00-12                    HashAgg(group=[{0}], EXPR$0=[MIN($1)])
00-13                      Project(c_custkey0=[$2], c_acctbal=[$0])
00-14                        HashJoin(condition=[=($1, $2)], joinType=[inner])
00-16                          Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`c_acctbal`, `c_nationkey`]]])
00-15                          HashAgg(group=[{0}])
00-17                            Project(c_custkey=[$0])
00-18                              HashJoin(condition=[=($1, $2)], 
joinType=[inner])
00-20                                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`c_custkey`, `c_nationkey`]]])
00-19                                Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`n_nationkey`]]])

3. run the simple query against the views:
0: jdbc:drill:schema=dfs.tpchViews> select count(*) from cview c, nview n where 
c.c_nationkey = n.n_nationkey and 10.0 > (select min(c1.c_acctbal) from cview 
c1 where c1.c_nationkey = n.n_nationkey);
+---------+
| EXPR$0  |
+---------+
| 24      |
+---------+
1 row selected (0.888 seconds)



4. run the same query against the parquet tables:
0: jdbc:drill:schema=dfs.tpchViews> select count(*) from 
cp.`tpch/customer.parquet` c, cp.`tpch/nation.parquet` n where c.c_nationkey = 
n.n_nationkey and 10.0 > (select min(c1.c_acctbal) from 
cp.`tpch/customer.parquet` c1 where c1.c_nationkey = n.n_nationkey);
+---------+
| EXPR$0  |
+---------+
| 1500    |
+---------+
1 row selected (0.534 seconds)


and  
5. show the plan on parquet tables:
0: jdbc:drill:schema=dfs.tpchViews> explain plan for select count(*) from 
cp.`tpch/customer.parquet` c, cp.`tpch/nation.parquet` n where c.c_nationkey = 
n.n_nationkey and 10.0 > (select min(c1.c_acctbal) from 
cp.`tpch/customer.parquet` c1 where c1.c_nationkey = n.n_nationkey);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03          Project($f0=[0])
00-04            HashJoin(condition=[=($0, $1)], joinType=[inner])
00-06              Project(c_nationkey=[$0])
00-08                HashJoin(condition=[=($0, $1)], joinType=[inner])
00-11                  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`c_nationkey`]]])
00-10                  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`n_nationkey`]]])
00-05              Project(c_nationkey0=[$0])
00-07                SelectionVectorRemover
00-09                  Filter(condition=[>(10.0, $1)])
00-12                    HashAgg(group=[{0}], EXPR$0=[MIN($1)])
00-13                      Project(c_nationkey0=[$2], c_acctbal=[$1])
00-14                        HashJoin(condition=[=($0, $2)], joinType=[inner])
00-16                          Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`c_nationkey`, `c_acctbal`]]])
00-15                          Project(c_nationkey0=[$0])
00-17                            HashAgg(group=[{0}])
00-18                              Project(c_nationkey=[$0])
00-19                                HashJoin(condition=[=($0, $1)], 
joinType=[inner])
00-21                                  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], 
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`c_nationkey`]]])
00-20                                  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], 
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, 
usedMetadataFile=false, columns=[`n_nationkey`]]])


6.  To verify the results:

0: jdbc:drill:schema=dfs.tpchViews> select min(c_acctbal) from cview;
+----------+
|  EXPR$0  |
+----------+
| -994.79  |
+----------+
1 row selected (0.261 seconds)
0: jdbc:drill:schema=dfs.tpchViews> select min(c_acctbal), max(c_acctbal) from 
cp.`tpch/customer.parquet`;
+----------+----------+
|  EXPR$0  |  EXPR$1  |
+----------+----------+
| -994.79  | 9987.71  |
+----------+----------+
1 row selected (0.248 seconds)
0: jdbc:drill:schema=dfs.tpchViews> select count(*) from 
cp.`tpch/customer.parquet`;
+---------+
| EXPR$0  |
+---------+
| 1500    |
+---------+
1 row selected (0.2 seconds)


> Queries return wrong results when applied to views
> --------------------------------------------------
>
>                 Key: DRILL-4590
>                 URL: https://issues.apache.org/jira/browse/DRILL-4590
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.6.0
>         Environment: RHEL 6.4  2.6.32-358.el6.x86_64
>            Reporter: Dechang Gu
>            Priority: Critical
>
> When run tpch queries on views created from parquet tables, query 17 returned 
> wrong results: 
> [root@ucs-node1 bugs]# /opt/mapr/drill/drill-1.6.0/bin/sqlline -u 
> "jdbc:drill:schema=dfs.tpchViews" -f /tmp/TPCH_17.sql 
> 1/1          select 
> sum(l.l_extendedprice) / 7.0 as avg_yearly 
> from 
> lineitem l, 
> part p 
> where 
> p.p_partkey = l.l_partkey 
> and p.p_brand = 'Brand#13' 
> and p.p_container = 'JUMBO CAN' 
> and l.l_quantity < ( 
> select 
> 0.2 * avg(l2.l_quantity) 
> from 
> lineitem l2 
> where 
> l2.l_partkey = p.p_partkey 
> );
> +---------------------+
> |     avg_yearly      |
> +---------------------+
> | 1139490.7042857148  |
> +---------------------+
> 1 row selected (20.364 seconds)
> While the same query directly on the parquet tables shows the correct results:
> [root@ucs-node1 bugs]# /opt/mapr/drill/drill-1.6.0/bin/sqlline -u 
> "jdbc:drill:schema=dfs.parquet" -f /tmp/17_par100.q 
> 1/1          select 
> sum(l.l_extendedprice) / 7.0 as avg_yearly 
> from 
> lineitem_par100 l, 
> part_par100 p 
> where 
> p.p_partkey = l.l_partkey 
> and p.p_brand = 'Brand#13' 
> and p.p_container = 'JUMBO CAN' 
> and l.l_quantity < ( 
> select 
> 0.2 * avg(l2.l_quantity) 
> from 
> lineitem_par100 l2 
> where 
> l2.l_partkey = p.p_partkey 
> );
> +----------------------+
> |      avg_yearly      |
> +----------------------+
> | 3.237333813714285E7  |
> +----------------------+
> 1 row selected (25.266 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-4590) Queries return wrong results when applied to views

Reply via email to