[
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arina Ielchiieva updated DRILL-4185:
------------------------------------
Issue Type: Improvement (was: Bug)
> UNION ALL involving empty directory on any side of union all results in
> Failed query
> ------------------------------------------------------------------------------------
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Relational Operators
> Affects Versions: 1.4.0
> Reporter: Khurram Faraaz
> Assignee: Vitalii Diravka
> Priority: Major
> Labels: doc-impacting, ready-to-commit
> Fix For: 1.13.0
>
>
> UNION ALL query that involves an empty directory on either side of UNION ALL
> operator results in FAILED query. We should return the results for the
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has
> no files in it.
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010]
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010]
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a
> schemaless Drill table for now.
> User can query empty directory and use it for queries with any JOIN and UNION
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table
> as well.
> It works similar to empty files:
> - The query with star will return empty result.
> - If some fields are indicated in select statement, that fields will be
> returned as INT-OPTIONAL types.
> - The empty directory in the query with UNION operator will not change the
> result as if the statement with UNION is absent in the query.
> - The query with joins will return an empty result except the cases of using
> outer join clauses, when the outer table for "right join" or derived table
> for "left join" has a data. In that case the data from a non-empty table is
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are
> introduced and used on execution state for interactions with other operators
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan
> for such table is not valid and SchemalessScan is used instead of that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)