[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

ASF GitHub Bot (JIRA) Thu, 01 Feb 2018 09:25:26 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348951#comment-16348951
 ]


ASF GitHub Bot commented on DRILL-6129:
---------------------------------------

Github user sachouche commented on the issue:

    https://github.com/apache/drill/pull/1106
  
    Thanks @paul-rogers for the information; I went through the PR and noticed 
the following:
    
    - BatchSchema invokes MaterializedField.isEquivalent()
    - With my fix, both methods consider nested columns but they have several 
differences
    
    1) RecordBatchLoader requires sameness as this information is used to reuse 
the value vectors; if old and new batch are deemed same, then the value vectors 
are reloaded using the load(...) API. The metadata better be the same or a 
runtime exception will occur
    
    2) RecordBatchLoader isSame(...) API compares two different java objects: 
SerializedField (obtained from protobufs) and already materialized value 
vectors MaterializedField
    
    3) RecordBatchLoader isSame(...) API tolerates unordered fields (within the 
same level) but not MaterializedField.isEquivalent() method
    
    4) MaterializedField.isEquivalent() ignores hidden columns such "$bits" and 
"$offsets" but not RecordBatchLoader isSame(...)
    
    I think moving forward, the best way to prevent bugs with regard to schema 
changes is by maintaining a document that establishes all the rules. This will 
allow QA to refine their tests and catch current limitations.  



> Query fails on nested data type schema change
> ---------------------------------------------
>
>                 Key: DRILL-6129
>                 URL: https://issues.apache.org/jira/browse/DRILL-6129
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Client - CLI
>    Affects Versions: 1.10.0
>            Reporter: salim achouche
>            Assignee: salim achouche
>            Priority: Minor
>             Fix For: 1.13.0
>
>
> Use-Case -
>  * Assume two parquet files with similar schemas except for a nested column
>  * Schema file1
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional int64 child_field
>  * Schema file2
>  ** int64 field1
>  ** optional group field2
>  *** optional group field2.1 (LIST)
>  **** repeated group list
>  ***** optional group element
>  ****** optional group child_field
>  ******* optional int64 child_field_f1
>  ******* optional int64 child_field_f1
>  * Essentially child_field changed from an int64 to a group of fields
>  
> Observed Query Failure
> select * from <file1 and file2>;
> Error: Unexpected RuntimeException: java.lang.IllegalArgumentException: The 
> field $bits$(UINT1:REQUIRED) doesn't match the provided metadata major_type {
>   minor_type: MAP
>   mode: REQUIRED
> Note that selecting one file at a time succeeds which seems to indicate the 
> issue has to do with the schema change logic. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6129) Query fails on nested data type schema change

Reply via email to