[
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346641#comment-16346641
]
ASF GitHub Bot commented on DRILL-4185:
---------------------------------------
Github user vdiravka commented on a diff in the pull request:
https://github.com/apache/drill/pull/1083#discussion_r165023581
--- Diff:
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -568,6 +570,22 @@ public void nullMixedComparatorEqualJoinHelper(final
String query) throws Except
.go();
}
+ /** InnerJoin with empty dir table on nullable cols, MergeJoin */
+ // TODO: the same tests should be added for HashJoin operator, DRILL-6070
+ @Test
--- End diff --
The bug was founded for NLJ and empty tables. I have resolved that issue.
The separate test class is added for empty dir tables and different join
operators.
Also I have made refactoring for the TestHashJoinAdvanced,
TestMergeJoinAdvanced, TestNestedLoopJoin classes.
> UNION ALL involving empty directory on any side of union all results in
> Failed query
> ------------------------------------------------------------------------------------
>
> Key: DRILL-4185
> URL: https://issues.apache.org/jira/browse/DRILL-4185
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 1.4.0
> Reporter: Khurram Faraaz
> Assignee: Vitalii Diravka
> Priority: Major
> Labels: doc-impacting
>
> UNION ALL query that involves an empty directory on either side of UNION ALL
> operator results in FAILED query. We should return the results for the
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has
> no files in it.
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010]
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010]
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a
> schemaless Drill table for now.
> User can query empty directory and use it for queries with any JOIN and UNION
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table
> as well.
> It works similar to empty files:
> - The query with star will return empty result.
> - If some fields are indicated in select statement, that fields will be
> returned as INT-OPTIONAL types.
> - The empty directory in the query with UNION operator will not change the
> result as if the statement with UNION is absent in the query.
> - The query with joins will return an empty result except the cases of using
> outer join clauses, when the outer table for "right join" or derived table
> for "left join" has a data. In that case the data from a non-empty table is
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are
> introduced and used on execution state for interactions with other operators
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan
> for such table is not valid and SchemalessScan is used instead of that.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)