[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

ASF GitHub Bot (JIRA) Wed, 31 Jan 2018 19:16:23 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347948#comment-16347948
 ]


ASF GitHub Bot commented on DRILL-6129:
---------------------------------------

GitHub user sachouche opened a pull request:

    https://github.com/apache/drill/pull/1106

    DRILL-6129: Fixed query failure due to nested column data type change

    Problem Description -
    - The Drillbit was able to successfully send batches containing different 
metadata (for nested columns)
    - This was the case when one or multiple scanners were involved
    - The issue happened within the client where value vectors are cached 
across batches
    - The load(...) API is responsible for updating values vectors when a new 
batch arrives
    - The RecordBatchLoader class is used to detect schema changes ; if this is 
the case, then previous value vectors are discarded and new ones created
    - There is a bug with the current implementation where only first level 
columns are compared
    
    Fix -
    - The fix is to improve the schema diff logic by including nested columns

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sachouche/drill DRILL-6129

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1106
    
----
commit 9ffb41f509cd2531e7f3cdf89a66605ec0fdf7a4
Author: Salim Achouche <sachouche2@...>
Date:   2018-02-01T02:59:58Z

    DRILL-6129: Fixed query failure due to nested column data type change

----


> Query fails on nested data type schema change
> ---------------------------------------------
>
>                 Key: DRILL-6129
>                 URL: https://issues.apache.org/jira/browse/DRILL-6129
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - CLI
>    Affects Versions: 1.10.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional group child_field
>  ******* optional int64 child_field_f1
>  ******* optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from <file1 and file2>;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

Reply via email to