[
https://issues.apache.org/jira/browse/DRILL-4590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230424#comment-15230424
]
Dechang Gu commented on DRILL-4590:
-----------------------------------
This issue can be easily reproduced as follows (using the drill built-in sample
database):
1. create the views:
0: jdbc:drill:schema=dfs.tpchViews> create or replace view cview as select
c_custkey, c_name, c_acctbal, c_nationkey from cp.`tpch/customer.parquet`;
+-------+---------------------------------------------------------------+
| ok | summary |
+-------+---------------------------------------------------------------+
| true | View 'cview' replaced successfully in 'dfs.tpchViews' schema |
+-------+---------------------------------------------------------------+
1 row selected (0.286 seconds)
0: jdbc:drill:schema=dfs.tpchViews> create or replace view nview as select
n_nationkey, n_name, n_regionkey from cp.`tpch/nation.parquet`;
+-------+---------------------------------------------------------------+
| ok | summary |
+-------+---------------------------------------------------------------+
| true | View 'nview' replaced successfully in 'dfs.tpchViews' schema |
+-------+---------------------------------------------------------------+
1 row selected (0.282 seconds)
2. Check the execution plans for the query on the views, which shows what
causes the wrong results:
0: jdbc:drill:schema=dfs.tpchViews> explain plan for select count(*) from cview
c, nview n where c.c_nationkey = n.n_nationkey and 10.0 > (select
min(c1.c_acctbal) from cview c1 where c1.c_nationkey = n.n_nationkey);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03 Project($f0=[0])
00-04 HashJoin(condition=[=($0, $1)], joinType=[inner])
00-06 Project(c_custkey=[$0])
00-08 HashJoin(condition=[=($1, $2)], joinType=[inner])
00-11 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1,
usedMetadataFile=false, columns=[`c_custkey`, `c_nationkey`]]])
00-10 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
usedMetadataFile=false, columns=[`n_nationkey`]]])
00-05 Project(c_custkey0=[$0])
00-07 SelectionVectorRemover
00-09 Filter(condition=[>(10.0, $1)])
00-12 HashAgg(group=[{0}], EXPR$0=[MIN($1)])
00-13 Project(c_custkey0=[$2], c_acctbal=[$0])
00-14 HashJoin(condition=[=($1, $2)], joinType=[inner])
00-16 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1,
usedMetadataFile=false, columns=[`c_acctbal`, `c_nationkey`]]])
00-15 HashAgg(group=[{0}])
00-17 Project(c_custkey=[$0])
00-18 HashJoin(condition=[=($1, $2)],
joinType=[inner])
00-20 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1,
usedMetadataFile=false, columns=[`c_custkey`, `c_nationkey`]]])
00-19 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
usedMetadataFile=false, columns=[`n_nationkey`]]])
3. run the simple query against the views:
0: jdbc:drill:schema=dfs.tpchViews> select count(*) from cview c, nview n where
c.c_nationkey = n.n_nationkey and 10.0 > (select min(c1.c_acctbal) from cview
c1 where c1.c_nationkey = n.n_nationkey);
+---------+
| EXPR$0 |
+---------+
| 24 |
+---------+
1 row selected (0.888 seconds)
4. run the same query against the parquet tables:
0: jdbc:drill:schema=dfs.tpchViews> select count(*) from
cp.`tpch/customer.parquet` c, cp.`tpch/nation.parquet` n where c.c_nationkey =
n.n_nationkey and 10.0 > (select min(c1.c_acctbal) from
cp.`tpch/customer.parquet` c1 where c1.c_nationkey = n.n_nationkey);
+---------+
| EXPR$0 |
+---------+
| 1500 |
+---------+
1 row selected (0.534 seconds)
and
5. show the plan on parquet tables:
0: jdbc:drill:schema=dfs.tpchViews> explain plan for select count(*) from
cp.`tpch/customer.parquet` c, cp.`tpch/nation.parquet` n where c.c_nationkey =
n.n_nationkey and 10.0 > (select min(c1.c_acctbal) from
cp.`tpch/customer.parquet` c1 where c1.c_nationkey = n.n_nationkey);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 Project(EXPR$0=[$0])
00-02 StreamAgg(group=[{}], EXPR$0=[COUNT()])
00-03 Project($f0=[0])
00-04 HashJoin(condition=[=($0, $1)], joinType=[inner])
00-06 Project(c_nationkey=[$0])
00-08 HashJoin(condition=[=($0, $1)], joinType=[inner])
00-11 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1,
usedMetadataFile=false, columns=[`c_nationkey`]]])
00-10 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
usedMetadataFile=false, columns=[`n_nationkey`]]])
00-05 Project(c_nationkey0=[$0])
00-07 SelectionVectorRemover
00-09 Filter(condition=[>(10.0, $1)])
00-12 HashAgg(group=[{0}], EXPR$0=[MIN($1)])
00-13 Project(c_nationkey0=[$2], c_acctbal=[$1])
00-14 HashJoin(condition=[=($0, $2)], joinType=[inner])
00-16 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1,
usedMetadataFile=false, columns=[`c_nationkey`, `c_acctbal`]]])
00-15 Project(c_nationkey0=[$0])
00-17 HashAgg(group=[{0}])
00-18 Project(c_nationkey=[$0])
00-19 HashJoin(condition=[=($0, $1)],
joinType=[inner])
00-21 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]],
selectionRoot=classpath:/tpch/customer.parquet, numFiles=1,
usedMetadataFile=false, columns=[`c_nationkey`]]])
00-20 Scan(groupscan=[ParquetGroupScan
[entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]],
selectionRoot=classpath:/tpch/nation.parquet, numFiles=1,
usedMetadataFile=false, columns=[`n_nationkey`]]])
6. To verify the results:
0: jdbc:drill:schema=dfs.tpchViews> select min(c_acctbal) from cview;
+----------+
| EXPR$0 |
+----------+
| -994.79 |
+----------+
1 row selected (0.261 seconds)
0: jdbc:drill:schema=dfs.tpchViews> select min(c_acctbal), max(c_acctbal) from
cp.`tpch/customer.parquet`;
+----------+----------+
| EXPR$0 | EXPR$1 |
+----------+----------+
| -994.79 | 9987.71 |
+----------+----------+
1 row selected (0.248 seconds)
0: jdbc:drill:schema=dfs.tpchViews> select count(*) from
cp.`tpch/customer.parquet`;
+---------+
| EXPR$0 |
+---------+
| 1500 |
+---------+
1 row selected (0.2 seconds)
> Queries return wrong results when applied to views
> --------------------------------------------------
>
> Key: DRILL-4590
> URL: https://issues.apache.org/jira/browse/DRILL-4590
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 1.6.0
> Environment: RHEL 6.4 2.6.32-358.el6.x86_64
> Reporter: Dechang Gu
> Priority: Critical
>
> When run tpch queries on views created from parquet tables, query 17 returned
> wrong results:
> [root@ucs-node1 bugs]# /opt/mapr/drill/drill-1.6.0/bin/sqlline -u
> "jdbc:drill:schema=dfs.tpchViews" -f /tmp/TPCH_17.sql
> 1/1 select
> sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
> lineitem l,
> part p
> where
> p.p_partkey = l.l_partkey
> and p.p_brand = 'Brand#13'
> and p.p_container = 'JUMBO CAN'
> and l.l_quantity < (
> select
> 0.2 * avg(l2.l_quantity)
> from
> lineitem l2
> where
> l2.l_partkey = p.p_partkey
> );
> +---------------------+
> | avg_yearly |
> +---------------------+
> | 1139490.7042857148 |
> +---------------------+
> 1 row selected (20.364 seconds)
> While the same query directly on the parquet tables shows the correct results:
> [root@ucs-node1 bugs]# /opt/mapr/drill/drill-1.6.0/bin/sqlline -u
> "jdbc:drill:schema=dfs.parquet" -f /tmp/17_par100.q
> 1/1 select
> sum(l.l_extendedprice) / 7.0 as avg_yearly
> from
> lineitem_par100 l,
> part_par100 p
> where
> p.p_partkey = l.l_partkey
> and p.p_brand = 'Brand#13'
> and p.p_container = 'JUMBO CAN'
> and l.l_quantity < (
> select
> 0.2 * avg(l2.l_quantity)
> from
> lineitem_par100 l2
> where
> l2.l_partkey = p.p_partkey
> );
> +----------------------+
> | avg_yearly |
> +----------------------+
> | 3.237333813714285E7 |
> +----------------------+
> 1 row selected (25.266 seconds)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)