xiarixiaoyao commented on PR #5376:
URL: https://github.com/apache/hudi/pull/5376#issuecomment-1104659123

   @alexeykudinkin 
   Thank you very much for your review, addressed all comments
   add more test for nested rename operation.
   
   by HUDI-3855: we will rewrite old record before write it to parquet file
   for schema evolution rename  scene, since old parquet file has old name, 
when we rewrite the old record with new schema, the value belong to old name 
will be missed  which lead to a serious problem
   for example;
   1)now current cow hoodie table have a old parquet file which schema is: a 
int, b string
   2) we rename  a -> aa,  now new schema for hoodie table  will be :  aa int, 
b string
   3) let us insert new data to current hoodie table,  during the insert 
operation we need to read old record from old parquet file,
   **before HUDI-3855**:  we can read old record directly and write it to new 
parquet directly, rename operation has nothing influence to it
   **after HUDI-3855**: before we write old record, we need rewrite it with new 
schema,  now the schema of old record is: a int, b string but the new schema 
is: aa int, b string,  if we rewrite the old record forcely we will miss the 
value of column a since it is not exists in new schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to