[ https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347948#comment-16347948 ]
ASF GitHub Bot commented on DRILL-6129: --------------------------------------- GitHub user sachouche opened a pull request: https://github.com/apache/drill/pull/1106 DRILL-6129: Fixed query failure due to nested column data type change Problem Description - - The Drillbit was able to successfully send batches containing different metadata (for nested columns) - This was the case when one or multiple scanners were involved - The issue happened within the client where value vectors are cached across batches - The load(...) API is responsible for updating values vectors when a new batch arrives - The RecordBatchLoader class is used to detect schema changes ; if this is the case, then previous value vectors are discarded and new ones created - There is a bug with the current implementation where only first level columns are compared Fix - - The fix is to improve the schema diff logic by including nested columns You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachouche/drill DRILL-6129 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1106 ---- commit 9ffb41f509cd2531e7f3cdf89a66605ec0fdf7a4 Author: Salim Achouche <sachouche2@...> Date: 2018-02-01T02:59:58Z DRILL-6129: Fixed query failure due to nested column data type change ---- > Query fails on nested data type schema change > --------------------------------------------- > > Key: DRILL-6129 > URL: https://issues.apache.org/jira/browse/DRILL-6129 > Project: Apache Drill > Issue Type: Bug > Components: Client - CLI > Affects Versions: 1.10.0 > Reporter: salim achouche > Assignee: salim achouche > Priority: Minor > Fix For: 1.13.0 > > > Use-Case - > * Assume two parquet files with similar schemas except for a nested column > * Schema file1 > ** int64 field1 > ** optional group field2 > *** optional group field2.1 (LIST) > **** repeated group list > ***** optional group element > ****** optional int64 child_field > * Schema file2 > ** int64 field1 > ** optional group field2 > *** optional group field2.1 (LIST) > **** repeated group list > ***** optional group element > ****** optional group child_field > ******* optional int64 child_field_f1 > ******* optional int64 child_field_f1 > * Essentially child_field changed from an int64 to a group of fields > > Observed Query Failure > select * from <file1 and file2>; > Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The > field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type { > minor_type: MAP > mode: REQUIRED > Note that selecting one file at a time succeeds which seems to indicate the > issue has to do with the schema change logic. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)