geserdugarov commented on issue #12133: URL: https://github.com/apache/hudi/issues/12133#issuecomment-2426314917
Currently, for this case `BucketIndexBulkInsertPartitioner` is used: https://github.com/apache/hudi/blob/5ccb19bb417d17368b0855fff041a9a129638802/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/BucketIndexBulkInsertPartitioner.java#L65-L68 First insert uses `SingleFileHandleCreateFactory`, but the second insert will use `AppendHandleFactory`, and create log file. I don't understand how **Bulk insert to COW table with Simple bucket index** should work by design. When we inserting data, that should update previous data, should we create new parquet file with new data, and call inline compaction (due to COW table type), or merge and write data to new parquet file, then it's not bulk insert? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
