xiarixiaoyao commented on PR #5376: URL: https://github.com/apache/hudi/pull/5376#issuecomment-1104659123
@alexeykudinkin Thank you very much for your review, addressed all comments add more test for nested rename operation. by HUDI-3855: we will rewrite old record before write it to parquet file for schema evolution rename scene, since old parquet file has old name, when we rewrite the old record with new schema, the value belong to old name will be missed which lead to a serious problem for example; 1)now current cow hoodie table have a old parquet file which schema is: a int, b string 2) we rename a -> aa, now new schema for hoodie table will be : aa int, b string 3) let us insert new data to current hoodie table, during the insert operation we need to read old record from old parquet file, **before HUDI-3855**: we can read old record directly and write it to new parquet directly, rename operation has nothing influence to it **after HUDI-3855**: before we write old record, we need rewrite it with new schema, now the schema of old record is: a int, b string but the new schema is: aa int, b string, if we rewrite the old record forcely we will miss the value of column a since it is not exists in new schema. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
