[
https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068359#comment-17068359
]
Prashant Wason commented on HUDI-741:
-------------------------------------
Since inserts/updates are performed using the HoodieWriteClient, the schema
check can be implemented there. There are two steps involved.
Step 1. Read the latest schema from the dataset. This is the schema used to
write data in the last commit.
Step 2: Validate the HoodieWriteConfig's writeSchema (new schema) against the
schema retrieved in step 1 (existing schema)
> Fix Hoodie's schema evolution checks
> ------------------------------------
>
> Key: HUDI-741
> URL: https://issues.apache.org/jira/browse/HUDI-741
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Reporter: Prashant Wason
> Priority: Minor
> Original Estimate: 120h
> Remaining Estimate: 120h
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by
> the HoodieWriteClient to create the records. The schema is also saved in the
> data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI
> dataset, schema can be evolved over time. But HUDI should ensure that the
> evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer
> schema can be used for the dataset by checking that the data written using
> the old schema can be read using the new schema.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)