Similarly, please try the 0.4.5 release. This has small file handling turned on by default..
Also please use the insert api/operation, (not bulk_insert) if you want this behavior. Let us know if you still run into issues.. On Tue, Feb 26, 2019 at 11:09 PM kaka chen <kaka11.c...@gmail.com> wrote: > Thanks! > > nishith agarwal <n3.nas...@gmail.com> 于2019年2月27日周三 下午2:56写道: > > > Hi Kaka, > > > > Hudi automatically does file sizing for you. As you ingest more inserts > the > > existing file will be automatically sized. You can play with a few > configs > > : > > > > https://hudi.apache.org/configurations.html#withStorageConfig -> This > > config allows you to set a max size for your output file. > > https://hudi.apache.org/configurations.html#compactionSmallFileSize -> > > This > > config allows you to set a minimum file size that will be automatically > > sized. > > > > As you can guess, the limitFileSize >= compactionFileSize. > > Hope this helps. > > > > Thanks, > > Nishith > > > > On Tue, Feb 26, 2019 at 6:52 PM kaka chen <kaka11.c...@gmail.com> wrote: > > > > > Hi All, > > > > > > I found Insert will generate at least one file each time when each > spark > > or > > > spark streaming batch. > > > Is it expected result? If it is, how to control these small files, is > > hudi > > > provide some tools to compact it? > > > > > > Thanks, > > > Frank > > > > > >