huliwuli commented on issue #10716: URL: https://github.com/apache/hudi/issues/10716#issuecomment-1981519987
> @huliwuli So, It looks like your per record size is really small. Hudi uses previous commit's statistics to guess future record sizes. For very first commit, it relies on the config "hoodie.copyonwrite.record.size.estimate" (default 1024). So setting it to a lower value might worked for you. Is that correct? > > bulk_insert don't merge the small files out of the box. So you need to run clustering job for merging small files. If most of the time you just get inserts, then you may just use COW table. I assume by delete previous data you mean delete old partitions only. Thanks for the reply. "hoodie.copyonwrite.record.size.estimate" works on my MOR table when I set it to 30-40. In most cases, we delete some rows for one old partition, but the number of rows is not predictable. We currently use MoR, if you suggest we use the COW table, can I switch to COW directly from the hudi options? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
