[ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bridget Bevens updated DRILL-4185:
----------------------------------
    Labels: doc-complete ready-to-commit  (was: doc-impacting ready-to-commit)

> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-4185
>                 URL: https://issues.apache.org/jira/browse/DRILL-4185
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>    Affects Versions: 1.4.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>            Priority: Major
>              Labels: doc-complete, ready-to-commit
>             Fix For: 1.13.0
>
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
>  Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it.
> Drill 1.4 git.commit.id=b9068117
>  4 node cluster on CentOS
> {code:java}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Solution overview:*
>  After resolving the current issue Drill can query an empty directory. It is 
> a schemaless Drill table for now. 
>  User can query empty directory and use it for queries with any JOIN and 
> UNION (UNION ALL) operators.
>  Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
>  It works similar to empty files:
>  - The query with star will return empty result.
>  - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types.
>  - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
>  - The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
>  - The empty directory table can be used in complex queries.
> *Code changes:*
>  Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
>  If empty directory contain parquet metadata cache files, the 
> ParquetGroupScan for such table is not valid and SchemalessScan is used 
> instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to