[jira] [Commented] (HUDI-741) Fix Hoodie's schema evolution checks

Yixue (Andrew) Zhu (Jira) Fri, 22 May 2020 23:03:46 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114555#comment-17114555
 ]


Yixue (Andrew) Zhu commented on HUDI-741:
-----------------------------------------

I am not sure the rationale for disallowing fields dropped for schema evolution 
is explained clearly in this Jira.

Why is the case the Reader Schema with less fields would cause data 
corruption/loss, if the change is intentional, i.e. users do not care about the 
dropped old fields anymore?

> Fix Hoodie's schema evolution checks
> ------------------------------------
>
>                 Key: HUDI-741
>                 URL: https://issues.apache.org/jira/browse/HUDI-741
>             Project: Apache Hudi (incubating)
>          Issue Type: Bug
>            Reporter: Prashant Wason
>            Assignee: Prashant Wason
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 120h
>          Time Spent: 20m
>  Remaining Estimate: 119h 40m
>
> HUDI requires a Schema to be specified in HoodieWriteConfig and is used by 
> the HoodieWriteClient to create the records. The schema is also saved in the 
> data files (parquet format) and log files (avro format).
> Since a schema is required each time new data is ingested into a HUDI 
> dataset, schema can be evolved over time. But HUDI should ensure that the 
> evolved schema is compatible with the older schema.
> HUDI specific validation of schema evolution should ensure that a newer 
> schema can be used for the dataset by checking that the data written using 
> the old schema can be read using the new schema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-741) Fix Hoodie's schema evolution checks

Reply via email to