[
https://issues.apache.org/jira/browse/HIVE-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647185#comment-14647185
]
Gaurav Kumar commented on HIVE-8950:
------------------------------------
I was wondering how will this be handled in schema evolution.
In avro, currently what we do is specify the avro.schema.url to the schema file.
When we want to change the schema, we only change the contents of the schema
file to, let's say, add a new column, and the new data will automatically be
deserialized using the new schema.
In this case if we specify a parquet file instead of a schema file, how will
that be used in schema evolution? We'll have to change the table DDL definition
everytime to point to the file containing the newest schema. Avro tables can
have different files pertaining to different schemas in diff partitions.
I know there is no separate schema per se in parquet files.
> Add support in ParquetHiveSerde to create table schema from a parquet file
> --------------------------------------------------------------------------
>
> Key: HIVE-8950
> URL: https://issues.apache.org/jira/browse/HIVE-8950
> Project: Hive
> Issue Type: Improvement
> Reporter: Ashish K Singh
> Assignee: Gaurav Kumar
> Attachments: HIVE-8950.1.patch, HIVE-8950.2.patch, HIVE-8950.3.patch,
> HIVE-8950.4.patch, HIVE-8950.5.patch, HIVE-8950.6.patch, HIVE-8950.7.patch,
> HIVE-8950.8.patch, HIVE-8950.patch
>
>
> PARQUET-76 and PARQUET-47 ask for creating parquet backed tables without
> having to specify the column names and types. As, parquet files store schema
> in their footer, it is possible to generate hive schema from parquet file's
> metadata. This will improve usability of parquet backed tables.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)