Re: [I] [SUPPORT] MOR table behavior for Spark bulk insert to COW [hudi]

via GitHub Mon, 21 Oct 2024 03:49:33 -0700


geserdugarov commented on issue #12133:
URL: https://github.com/apache/hudi/issues/12133#issuecomment-2426314917


   Currently, for this case `BucketIndexBulkInsertPartitioner` is used:
   
https://github.com/apache/hudi/blob/5ccb19bb417d17368b0855fff041a9a129638802/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BucketIndexBulkInsertPartitioner.java#L65-L68
   First insert uses `SingleFileHandleCreateFactory`, but the second insert 
will use `AppendHandleFactory`, and create log file.
   
   I don't understand how **Bulk insert to COW table with Simple bucket index** 
should work by design. When we inserting data, that should update previous 
data, should we create new parquet file with new data, and call inline 
compaction (due to COW table type), or merge and write data to new parquet 
file, then it's not bulk insert?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [SUPPORT] MOR table behavior for Spark bulk insert to COW [hudi]

Reply via email to