yihua commented on issue #5385: URL: https://github.com/apache/hudi/issues/5385#issuecomment-1111537861
> The above test case does not add _hoodie_is_deleted to the existing hudi table before writing a dataset with _hoodie_is_deleted column. In that case, the dataset to be written has the new schema, and Hudi automatically picks that up and evolves the table schema when writing the data. > Also, what is the best way to modify the schema of an existing hudi table? As @pratyakshsharma mentioned, you don't explicitly modify the schema of an existing Hudi table manually. Hudi takes care of it in a backward-compatible way. You only need to provide the new schema or the data with new schema. For the example I gave, the changed schema is passed to Deltastreamer as an argument so that the deltasteamer can pick that up, and update the table with the right schema. For Spark datasource (e.g., `df.write`), you don't even need to change the schema, since the dataframe to write has embedded schema with the field `_hoodie_is_deleted`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
