[ 
https://issues.apache.org/jira/browse/DRILL-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-5681:
------------------------------------
    Description: 
The following repo was based on a testcase provided by Arjun 
Rajan([email protected]). 

Drill returns incorrect query result, when the query has a correlated subquery 
and querying against a view defined with select *, or querying a subquery with 
select *.  

Case 1: Querying view with select * + correlated subquery
{code}
create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
{code}

//Q1 :  return 25 rows. The correct answer is 0 row. 
{code}
SELECT n_nationkey, n_name
FROM  dfs.tmp.nation_view a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)

+--------------+-----------------+
| n_nationkey  |     n_name      |
+--------------+-----------------+
| 0            | ALGERIA         |
| 1            | ARGENTINA       |
| 2            | BRAZIL          |
...
| 24           | UNITED STATES   |
+--------------+-----------------+
25 rows selected (0.614 seconds)
{code}

// Q2:  return 0 row. The correct answer is 25 rows.
{code}
SELECT n_nationkey, n_name
FROM  dfs.tmp.nation_view a
WHERE EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--------------+---------+
| n_nationkey  | n_name  |
+--------------+---------+
+--------------+---------+
No rows selected (0.4 seconds)
{code}

Case 2: Querying a table expression with select *
// Q3: return 25 rows. The correct result is 0 row
{code}
SELECT n_nationkey, n_name
FROM  (
  SELECT * FROM cp.`tpch/nation.parquet`
) a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--------------+-----------------+
| n_nationkey  |     n_name      |
+--------------+-----------------+
| 0            | ALGERIA         |
| 1            | ARGENTINA       |
...
| 24           | UNITED STATES   |
+--------------+-----------------+
25 rows selected (0.451 seconds)
{code}

Q4: return 0 row. The correct result is 25 rows.
{code}
SELECT n_nationkey, n_name
FROM  (
  SELECT * FROM cp.`tpch/nation.parquet`
) a
WHERE EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--------------+---------+
| n_nationkey  | n_name  |
+--------------+---------+
+--------------+---------+
No rows selected (0.515 seconds)
{code}

All cases can be reproduced without view usage, using sub-select with star is 
enough.
For example:
{code}
SELECT n_nationkey, n_name
FROM  (select * from cp.`tpch/nation.parquet`) a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
{code}


  was:
The following repo was based on a testcase provided by Arjun 
Rajan([email protected]). 

Drill returns incorrect query result, when the query has a correlated subquery 
and querying against a view defined with select *, or querying a subquery with 
select *.  

Case 1: Querying view with select * + correlated subquery
{code}
create view dfs.tmp.region_view as select * from cp.`tpch/region.parquet`;
{code}

//Q1 :  return 25 rows. The correct answer is 0 row. 
{code}
SELECT n_nationkey, n_name
FROM  dfs.tmp.nation_view a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)

+--------------+-----------------+
| n_nationkey  |     n_name      |
+--------------+-----------------+
| 0            | ALGERIA         |
| 1            | ARGENTINA       |
| 2            | BRAZIL          |
...
| 24           | UNITED STATES   |
+--------------+-----------------+
25 rows selected (0.614 seconds)
{code}

// Q2:  return 0 row. The correct answer is 25 rows.
{code}
SELECT n_nationkey, n_name
FROM  dfs.tmp.nation_view a
WHERE EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--------------+---------+
| n_nationkey  | n_name  |
+--------------+---------+
+--------------+---------+
No rows selected (0.4 seconds)
{code}

Case 2: Querying a table expression with select *
// Q3: return 25 rows. The correct result is 0 row
{code}
SELECT n_nationkey, n_name
FROM  (
  SELECT * FROM cp.`tpch/nation.parquet`
) a
WHERE NOT EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--------------+-----------------+
| n_nationkey  |     n_name      |
+--------------+-----------------+
| 0            | ALGERIA         |
| 1            | ARGENTINA       |
...
| 24           | UNITED STATES   |
+--------------+-----------------+
25 rows selected (0.451 seconds)
{code}

Q4: return 0 row. The correct result is 25 rows.
{code}
SELECT n_nationkey, n_name
FROM  (
  SELECT * FROM cp.`tpch/nation.parquet`
) a
WHERE EXISTS (SELECT 1
FROM cp.`tpch/region.parquet` b
WHERE b.r_regionkey =  a.n_regionkey
)
+--------------+---------+
| n_nationkey  | n_name  |
+--------------+---------+
+--------------+---------+
No rows selected (0.515 seconds)
{code}





> Incorrect query result when query uses star and correlated subquery
> -------------------------------------------------------------------
>
>                 Key: DRILL-5681
>                 URL: https://issues.apache.org/jira/browse/DRILL-5681
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> The following repo was based on a testcase provided by Arjun 
> Rajan([email protected]). 
> Drill returns incorrect query result, when the query has a correlated 
> subquery and querying against a view defined with select *, or querying a 
> subquery with select *.  
> Case 1: Querying view with select * + correlated subquery
> {code}
> create view dfs.tmp.nation_view as select * from cp.`tpch/nation.parquet`;
> {code}
> //Q1 :  return 25 rows. The correct answer is 0 row. 
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+-----------------+
> | n_nationkey  |     n_name      |
> +--------------+-----------------+
> | 0            | ALGERIA         |
> | 1            | ARGENTINA       |
> | 2            | BRAZIL          |
> ...
> | 24           | UNITED STATES   |
> +--------------+-----------------+
> 25 rows selected (0.614 seconds)
> {code}
> // Q2:  return 0 row. The correct answer is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+---------+
> | n_nationkey  | n_name  |
> +--------------+---------+
> +--------------+---------+
> No rows selected (0.4 seconds)
> {code}
> Case 2: Querying a table expression with select *
> // Q3: return 25 rows. The correct result is 0 row
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+-----------------+
> | n_nationkey  |     n_name      |
> +--------------+-----------------+
> | 0            | ALGERIA         |
> | 1            | ARGENTINA       |
> ...
> | 24           | UNITED STATES   |
> +--------------+-----------------+
> 25 rows selected (0.451 seconds)
> {code}
> Q4: return 0 row. The correct result is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+---------+
> | n_nationkey  | n_name  |
> +--------------+---------+
> +--------------+---------+
> No rows selected (0.515 seconds)
> {code}
> All cases can be reproduced without view usage, using sub-select with star is 
> enough.
> For example:
> {code}
> SELECT n_nationkey, n_name
> FROM  (select * from cp.`tpch/nation.parquet`) a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to