yihua commented on issue #5385:
URL: https://github.com/apache/hudi/issues/5385#issuecomment-1111537861

   > The above test case does not add _hoodie_is_deleted to the existing hudi 
table before writing a dataset with _hoodie_is_deleted column.
   
   In that case, the dataset to be written has the new schema, and Hudi 
automatically picks that up and evolves the table schema when writing the data.
   
   > Also, what is the best way to modify the schema of an existing hudi table?
   As @pratyakshsharma mentioned, you don't explicitly modify the schema of an 
existing Hudi table manually.  Hudi takes care of it in a backward-compatible 
way.  You only need to provide the new schema or the data with new schema.  For 
the example I gave, the changed schema is passed to Deltastreamer as an 
argument so that the deltasteamer can pick that up, and update the table with 
the right schema.  For Spark datasource (e.g., `df.write`), you don't even need 
to change the schema, since the dataframe to write has embedded schema with the 
field `_hoodie_is_deleted`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to