ad1happy2go commented on issue #10716: URL: https://github.com/apache/hudi/issues/10716#issuecomment-1980854966
@huliwuli So, It looks like your per record size is really small. Hudi uses previous commit's statistics to guess future record sizes. For very first commit, it relies on the config "hoodie.copyonwrite.record.size.estimate" (default 1024). So setting it to a lower value might worked for you. Is that correct? bulk_insert don't merge the small files out of the box. So you need to run clustering job for merging small files. If most of the time you just get inserts, then you may just use COW table. I assume by delete previous data you mean delete old partitions only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
