[
https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068364#comment-17068364
]
Prashant Wason commented on HUDI-741:
-------------------------------------
Implementation notes:
hudi-hive-sync module already has code which reads the latest schema from a
HUDI table. I have moved that code to hudi-common so it can be used within
HoodieWriteClient as hudi-common is a dependency of hudi-client.
SchemaCompatibility needs to be implemented separately (explained in detail in
the next comment)
Unit test requires a way to generate records using various schemas.
HoodieTestDataGenerator is hardcoded to use TRIP_EXAMPLE_SCHEMA so has to be
modified to take the schema as a parameter.
> Fix Hoodie's schema evolution checks
> ------------------------------------
>
> Key: HUDI-741
> URL: https://issues.apache.org/jira/browse/HUDI-741
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Reporter: Prashant Wason
> Priority: Minor
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by
> the HoodieWriteClient to create the records. The schema is also saved in the
> data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI
> dataset, schema can be evolved over time. But HUDI should ensure that the
> evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer
> schema can be used for the dataset by checking that the data written using
> the old schema can be read using the new schema.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)