[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Diravka updated DRILL-4185:
-----------------------------------
    Description: 
UNION ALL query that involves an empty directory on either side of UNION ALL 
operator results in FAILED query. We should return the results for the 
non-empty side (input) of UNION ALL.
 Note that empty_DIR is an empty directory, the directory exists, but it has no 
files in it.

Drill 1.4 git.commit.id=b9068117
 4 node cluster on CentOS
{code:java}
0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL select 
cast(columns[0] as int) c1 from `testWindow.csv`;
Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
'empty_DIR' not found


[Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
`testWindow.csv` UNION ALL select columns[0] from empty_DIR;
Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
'empty_DIR' not found


[Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}
*Solution overview:*
 After resolving the current issue Drill can query an empty directory. It is a 
schemaless Drill table for now. 
 User can query empty directory and use it for queries with any JOIN and UNION 
(UNION ALL) operators.
 Empty directory with parquet metadata cache files is schemaless Drill table as 
well. 
 It works similar to empty files:
 - The query with star will return empty result.
 - If some fields are indicated in select statement, that fields will be 
returned as INT-OPTIONAL types.
 - The empty directory in the query with UNION operator will not change the 
result as if the statement with UNION is absent in the query.
 - The query with joins will return an empty result except the cases of using 
outer join clauses, when the outer table for "right join" or derived table for 
"left join" has a data. In that case the data from a non-empty table is 
returned.
 - The empty directory table can be used in complex queries.

*Code changes:*
 Internally empty directory interprets as DynamicDrillTable with null 
selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
introduced and used on execution state for interactions with other operators 
and batches.
 If empty directory contain parquet metadata cache files, the ParquetGroupScan 
for such table is not valid and SchemalessScan is used instead of that.

  was:
UNION ALL query that involves an empty directory on either side of UNION ALL 
operator results in FAILED query. We should return the results for the 
non-empty side (input) of UNION ALL.
Note that empty_DIR is an empty directory, the directory exists, but it has no 
files in it. 

Drill 1.4 git.commit.id=b9068117
4 node cluster on CentOS

{code}
0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL select 
cast(columns[0] as int) c1 from `testWindow.csv`;
Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
'empty_DIR' not found


[Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
(state=,code=0)

0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
`testWindow.csv` UNION ALL select columns[0] from empty_DIR;
Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
'empty_DIR' not found


[Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
(state=,code=0)
{code}

*Fix overview:*
After resolving the current issue Drill can query an empty directory. It is a 
schemaless Drill table for now. 
User can query empty directory and use it for queries with any JOIN and UNION 
(UNION ALL) operators.
Empty directory with parquet metadata cache files is schemaless Drill table as 
well. 
It works similar to empty files:
- The query with star will return empty result. 
- If some fields are indicated in select statement, that fields will be 
returned as INT-OPTIONAL types. 
- The empty directory in the query with UNION operator will not change the 
result as if the statement with UNION is absent in the query.
-  The query with joins will return an empty result except the cases of using 
outer join clauses, when the outer table for "right join" or derived table for 
"left join" has a data. In that case the data from a non-empty table is 
returned.
- The empty directory table can be used in complex queries.


*Code changes:*
Internally empty directory interprets as DynamicDrillTable with null selection. 
SchemalessScan, SchemalessBatchCreator and SchemalessBatch are introduced and 
used on execution state for interactions with other operators and batches.
If empty directory contain parquet metadata cache files, the ParquetGroupScan 
for such table is not valid and SchemalessScan is used instead of that.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-4185
>                 URL: https://issues.apache.org/jira/browse/DRILL-4185
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.4.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.13.0
>
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
>  Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it.
> Drill 1.4 git.commit.id=b9068117
>  4 node cluster on CentOS
> {code:java}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Solution overview:*
>  After resolving the current issue Drill can query an empty directory. It is 
> a schemaless Drill table for now. 
>  User can query empty directory and use it for queries with any JOIN and 
> UNION (UNION ALL) operators.
>  Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
>  It works similar to empty files:
>  - The query with star will return empty result.
>  - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types.
>  - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
>  - The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
>  - The empty directory table can be used in complex queries.
> *Code changes:*
>  Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
>  If empty directory contain parquet metadata cache files, the 
> ParquetGroupScan for such table is not valid and SchemalessScan is used 
> instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to