[ 
https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068364#comment-17068364
 ] 

Prashant Wason commented on HUDI-741:
-------------------------------------

Implementation notes:

hudi-hive-sync module already has code which reads the latest schema from a 
HUDI table. I have moved that code to hudi-common so it can be used within 
HoodieWriteClient as hudi-common is a dependency of hudi-client.

SchemaCompatibility needs to be implemented separately (explained in detail in 
the next comment)

Unit test requires a way to generate records using various schemas. 
HoodieTestDataGenerator is hardcoded to use TRIP_EXAMPLE_SCHEMA so has to be 
modified to take the schema as a parameter.

 

> Fix Hoodie's schema evolution checks
> ------------------------------------
>
>                 Key: HUDI-741
>                 URL: https://issues.apache.org/jira/browse/HUDI-741
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Priority: Minor
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by 
> the HoodieWriteClient to create the records. The schema is also saved in the 
> data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI 
> dataset, schema can be evolved over time. But HUDI should ensure that the 
> evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer 
> schema can be used for the dataset by checking that the data written using 
> the old schema can be read using the new schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to