Hi Kaka,

Hudi automatically does file sizing for you. As you ingest more inserts the
existing file will be automatically sized. You can play with a few configs
:

https://hudi.apache.org/configurations.html#withStorageConfig -> This
config allows you to set a max size for your output file.
https://hudi.apache.org/configurations.html#compactionSmallFileSize -> This
config allows you to set a minimum file size that will be automatically
sized.

As you can guess, the limitFileSize >= compactionFileSize.
Hope this helps.

Thanks,
Nishith

On Tue, Feb 26, 2019 at 6:52 PM kaka chen <kaka11.c...@gmail.com> wrote:

> Hi All,
>
> I found Insert will generate at least one file each time when each spark or
> spark streaming batch.
> Is it expected result? If it is, how to control these small files, is hudi
> provide some tools to compact it?
>
> Thanks,
> Frank
>

Reply via email to