[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

Matt McCline (JIRA) Mon, 11 Jul 2016 14:25:50 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371651#comment-15371651
 ]


Matt McCline edited comment on HIVE-13974 at 7/11/16 9:24 PM:
--------------------------------------------------------------

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get our release out the door.  We have so little runway left.  I've had 10+ 
JIRAs for weeks.  Whenever I knock some down more appear.  Also, there really 
needs to be a parallel HIVE JIRA for it and we must make sure name mapping is 
fully supported for HIVE.  Given how *difficult* Schema Evolution has been I 
simply don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]



was (Author: mmccline):

[~owen.omalley] Thanks for looking at this.

No, the semantics of sameCategoryAndAttributes is different than equals.  The 
TypeDescription.equals method compares (type) id and maximumId which does not 
work when there is an interior STRUCT column with a different number of 
columns.  It makes it seem like a type conversion is needed when one is not 
needed and other parts of the code throw exceptions complaining "no need to 
convert a STRING to a STRING".

There are 3 kinds of schema not 2.  Part of the problem I'm trying to solve is 
the ambiguity at different parts of the code as to which schema is being used.  
It is the one being returned by the input file format, is it the schema being 
fed back to the ORC raw merger that included ACID columns, or is it the 
unconverted file schema.  I don't care what the first 2 schemas are called as 
long as the names are distinct.  Maybe the names could be reader, 
internalReader, and file.

About ORC-54 -- it is not practical right now in terms of time.  We have got to 
get Erie out the door.  We have so little runway left.  I've had 10+ JIRAs for 
weeks.  Whenever I knock some down more appear.  Also, there really needs to be 
a parallel HIVE JIRA for it and we must make sure name mapping is fully 
supported for HIVE.  Given how *difficult* Schema Evolution has been I simply 
don't believe it will *just work* with ORC only unit tests.

FYI [~hagleitn] [~ekoifman]


> ORC Schema Evolution doesn't support add columns to non-last STRUCT columns
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-13974
>                 URL: https://issues.apache.org/jira/browse/HIVE-13974
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, ORC, Transactions
>    Affects Versions: 1.3.0, 2.1.0, 2.2.0
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Blocker
>         Attachments: HIVE-13974.01.patch, HIVE-13974.02.patch, 
> HIVE-13974.03.patch, HIVE-13974.04.patch, HIVE-13974.05.WIP.patch, 
> HIVE-13974.06.patch, HIVE-13974.07.patch, HIVE-13974.08.patch, 
> HIVE-13974.09.patch, HIVE-13974.091.patch
>
>
> Currently, the included columns are based on the fileSchema and not the 
> readerSchema which doesn't work for adding columns to non-last STRUCT data 
> type columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-13974) ORC Schema Evolution doesn't support add columns to non-last STRUCT columns

Reply via email to