[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

ASF GitHub Bot (JIRA) Wed, 31 Jan 2018 03:26:18 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346641#comment-16346641
 ]


ASF GitHub Bot commented on DRILL-4185:
---------------------------------------

Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1083#discussion_r165023581
  
    --- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
    @@ -568,6 +570,22 @@ public void nullMixedComparatorEqualJoinHelper(final 
String query) throws Except
             .go();
       }
     
    +  /** InnerJoin with empty dir table on nullable cols, MergeJoin */
    +  // TODO: the same tests should be added for HashJoin operator, DRILL-6070
    +  @Test
    --- End diff --
    
    The bug was founded for NLJ and empty tables. I have resolved that issue.
    The separate test class is added for empty dir tables and different join 
operators.
    
    Also I have made refactoring for the TestHashJoinAdvanced, 
TestMergeJoinAdvanced, TestNestedLoopJoin classes.


> UNION ALL involving empty directory on any side of union all results in 
> Failed query
> ------------------------------------------------------------------------------------
>
>                 Key: DRILL-4185
>                 URL: https://issues.apache.org/jira/browse/DRILL-4185
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.4.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>            Priority: Major
>              Labels: doc-impacting
>
> UNION ALL query that involves an empty directory on either side of UNION ALL 
> operator results in FAILED query. We should return the results for the 
> non-empty side (input) of UNION ALL.
> Note that empty_DIR is an empty directory, the directory exists, but it has 
> no files in it. 
> Drill 1.4 git.commit.id=b9068117
> 4 node cluster on CentOS
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select columns[0] from empty_DIR UNION ALL 
> select cast(columns[0] as int) c1 from `testWindow.csv`;
> Error: VALIDATION ERROR: From line 1, column 24 to line 1, column 32: Table 
> 'empty_DIR' not found
> [Error Id: 5c024786-6703-4107-8a4a-16c96097be08 on centos-01.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs.tmp> select cast(columns[0] as int) c1 from 
> `testWindow.csv` UNION ALL select columns[0] from empty_DIR;
> Error: VALIDATION ERROR: From line 1, column 90 to line 1, column 98: Table 
> 'empty_DIR' not found
> [Error Id: 58c98bc4-99df-425c-aa07-c8c5faec4748 on centos-01.qa.lab:31010] 
> (state=,code=0)
> {code}
> *Fix overview:*
> After resolving the current issue Drill can query an empty directory. It is a 
> schemaless Drill table for now. 
> User can query empty directory and use it for queries with any JOIN and UNION 
> (UNION ALL) operators.
> Empty directory with parquet metadata cache files is schemaless Drill table 
> as well. 
> It works similar to empty files:
> - The query with star will return empty result. 
> - If some fields are indicated in select statement, that fields will be 
> returned as INT-OPTIONAL types. 
> - The empty directory in the query with UNION operator will not change the 
> result as if the statement with UNION is absent in the query.
> -  The query with joins will return an empty result except the cases of using 
> outer join clauses, when the outer table for "right join" or derived table 
> for "left join" has a data. In that case the data from a non-empty table is 
> returned.
> - The empty directory table can be used in complex queries.
> *Code changes:*
> Internally empty directory interprets as DynamicDrillTable with null 
> selection. SchemalessScan, SchemalessBatchCreator and SchemalessBatch are 
> introduced and used on execution state for interactions with other operators 
> and batches.
> If empty directory contain parquet metadata cache files, the ParquetGroupScan 
> for such table is not valid and SchemalessScan is used instead of that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-4185) UNION ALL involving empty directory on any side of union all results in Failed query

Reply via email to