[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

Barna Zsombor Klara (JIRA) Tue, 09 May 2017 01:26:46 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16002276#comment-16002276
 ]


Barna Zsombor Klara commented on HIVE-16559:
--------------------------------------------

Just to clarify, technically it is possible to fix this issue in the 
{{ObjectInspectorConverters}} by matching the converters between the input and 
output fields based on field names (currently they are matched based on field 
order). But this would mean an overhead whenever we select from a table, even 
when there is no schema evolution. I find this tradeoff to be not worth it 
especially since altering the table with the cascade option yields correct 
results with a one time overhead, when the column changes are propagated to the 
partitions.

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16559
>                 URL: https://issues.apache.org/jira/browse/HIVE-16559
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>            Reporter: Barna Zsombor Klara
>            Assignee: Barna Zsombor Klara
>             Fix For: 3.0.0
>
>         Attachments: HIVE-16559.01.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>    'mary' as name,
>    5 AS favnumber,
>    'blue' AS favcolor,
>    35 AS age,
>    'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );   <!--- No cascade option, so the partition will not be altered. 
> {code}
> {{SELECT * FROM myparquettable_parted where day='2017-04-04';}}
> will fail with:
> {{java.lang.UnsupportedOperationException: Cannot inspect 
> org.apache.hadoop.io.IntWritable}}
> Hive should either match the columns together or prevent the user from 
> dropping columns from the table.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

Reply via email to