Hi Kaka, Hudi automatically does file sizing for you. As you ingest more inserts the existing file will be automatically sized. You can play with a few configs :
https://hudi.apache.org/configurations.html#withStorageConfig -> This config allows you to set a max size for your output file. https://hudi.apache.org/configurations.html#compactionSmallFileSize -> This config allows you to set a minimum file size that will be automatically sized. As you can guess, the limitFileSize >= compactionFileSize. Hope this helps. Thanks, Nishith On Tue, Feb 26, 2019 at 6:52 PM kaka chen <kaka11.c...@gmail.com> wrote: > Hi All, > > I found Insert will generate at least one file each time when each spark or > spark streaming batch. > Is it expected result? If it is, how to control these small files, is hudi > provide some tools to compact it? > > Thanks, > Frank >