[ 
https://issues.apache.org/jira/browse/HIVE-15995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373792#comment-16373792
 ] 

Vihang Karajgaonkar commented on HIVE-15995:
--------------------------------------------

Thanks [~szita] for the patch. The patch assumes that user wants to do a 
cascade the schema change to all the partitions as well. Instead of that I 
think it would be good if we introduce a optional field to the command 
specifying clearly whether the user wants to cascade the schema change to 
partitions or not. This is in line with the {{[CASCADE|RESTRICT]}} usage in the 
{{ALTER TABLE table_name ADD|REPLACE COLUMNS ... [CASCADE|RESTRICT]}} syntax. 
The default should be RESTRICT which means that only table metadata should be 
updated. If the user specifies CASCADE only then the metadata update should be 
cascaded to partitions. Assuming that the user wants to cascade could be 
dangerous esp. if there are older partitions which doesn't have all the column 
information.

Secondly, does the q-test fail without the patch? I thought the describe 
command already uses the Deserializer internally so the updated schema will be 
seen regardless. I think you will need to write a junit test and confirm that 
metadata is updated using metastore API. Also, can you take a look at the test 
failures above? Some of them look new (I am starting to lose track of the 
regular failures now) Not sure if they are related to the patch or not.

> Syncing metastore table with serde schema
> -----------------------------------------
>
>                 Key: HIVE-15995
>                 URL: https://issues.apache.org/jira/browse/HIVE-15995
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 1.2.1, 2.1.0, 3.0.0
>            Reporter: Michal Ferlinski
>            Assignee: Adam Szita
>            Priority: Major
>         Attachments: HIVE-15995.1.patch, HIVE-15995.2.patch, 
> HIVE-15995.3.patch, HIVE-15995.patch, cx1.avsc, cx2.avsc
>
>
> Hive enables table schema evolution via properties. For avro e.g. we can 
> alter the 'avro.schema.url' property to update table schema to the next 
> version. Updating properties however doesn't affect column list stored in 
> metastore DB so the table is not in the newest version when returned from 
> metastore API. This is problem for tools working with metastore (e.g. Presto).
> To solve this issue I suggest to introduce new DDL statement syncing 
> metastore columns with those from serde:
> {code}
> ALTER TABLE user_test1 UPDATE COLUMNS
> {code}
> Note that this is format independent solution. 
> To reproduce, follow the instructions below:
> - Create table based on avro schema version 1 (cxv1.avsc)
> {code}
> CREATE EXTERNAL TABLE user_test1
>   PARTITIONED BY (dt string)
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   LOCATION
>   '/tmp/schema-evolution/user_test1'
>   TBLPROPERTIES ('avro.schema.url'='/tmp/schema-evolution/cx1.avsc');
> {code}
> - Update schema to version 2 (cx2.avsc)
> {code}
> ALTER TABLE user_test1 SET TBLPROPERTIES ('avro.schema.url' = 
> '/tmp/schema-evolution/cx2.avsc');
> {code}
> - Print serde columns (top info) and metastore columns (Detailed Table 
> Information):
> {code}
> DESCRIBE EXTENDED user_test1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to