Re: [PR] [HUDI-9468] Parquet Binary Copy at Rowgroup Level [hudi]

via GitHub Tue, 10 Jun 2025 05:21:38 -0700


xiarixiaoyao commented on PR #13365:
URL: https://github.com/apache/hudi/pull/13365#issuecomment-2958992893


   @zhangyue19921010  @danny0405  pls check the match of table_schema and 
parquet schema before do binary copy.
   Spark BulkInsert will produce parquets with required attribute； however 
those columns in hudi itself are optional。
    Do binary copy will corrupt parquet file which is a serious problem 
https://github.com/apache/spark/blob/1a3ae66c6c48bb319f0798826085e694fa7d0b58/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java#L389。
   Suggest: throw exception directly when found above mismatch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-9468] Parquet Binary Copy at Rowgroup Level [hudi]

Reply via email to