Nishith, Thanks, will try it.
Thanks, Frank nishith agarwal <n3.nas...@gmail.com> 于2019年3月12日周二 上午11:21写道: > Frank, > > You can play with a couple of configs to keep X number of older file > versions. Take a look at these configs : > https://hudi.apache.org/configurations.html#withCompactionConfig. > Specifically, you can choose the number of commits you want to keep, here > commits = versions. > Depending on how long your query runs, you might want to keep the older > data file for a configured amount of time after which it will be cleaned. > > Thanks, > Nishith > > On Mon, Mar 11, 2019 at 7:42 PM kaka chen <kaka11.c...@gmail.com> wrote: > > > Hi Vinoth, > > > > To use this feature, I find the new file will write a new file with old > > inserted records. > > But how to cleanup the old files when use cow tables? > > > > Thanks, > > Frank > > > > Vinoth Chandar <vin...@apache.org> 于2019年2月28日周四 上午3:24写道: > > > > > Similarly, please try the 0.4.5 release. This has small file handling > > > turned on by default.. > > > > > > Also please use the insert api/operation, (not bulk_insert) if you want > > > this behavior. > > > > > > Let us know if you still run into issues.. > > > > > > On Tue, Feb 26, 2019 at 11:09 PM kaka chen <kaka11.c...@gmail.com> > > wrote: > > > > > > > Thanks! > > > > > > > > nishith agarwal <n3.nas...@gmail.com> 于2019年2月27日周三 下午2:56写道: > > > > > > > > > Hi Kaka, > > > > > > > > > > Hudi automatically does file sizing for you. As you ingest more > > inserts > > > > the > > > > > existing file will be automatically sized. You can play with a few > > > > configs > > > > > : > > > > > > > > > > https://hudi.apache.org/configurations.html#withStorageConfig -> > > This > > > > > config allows you to set a max size for your output file. > > > > > > https://hudi.apache.org/configurations.html#compactionSmallFileSize > > -> > > > > > This > > > > > config allows you to set a minimum file size that will be > > automatically > > > > > sized. > > > > > > > > > > As you can guess, the limitFileSize >= compactionFileSize. > > > > > Hope this helps. > > > > > > > > > > Thanks, > > > > > Nishith > > > > > > > > > > On Tue, Feb 26, 2019 at 6:52 PM kaka chen <kaka11.c...@gmail.com> > > > wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I found Insert will generate at least one file each time when > each > > > > spark > > > > > or > > > > > > spark streaming batch. > > > > > > Is it expected result? If it is, how to control these small > files, > > is > > > > > hudi > > > > > > provide some tools to compact it? > > > > > > > > > > > > Thanks, > > > > > > Frank > > > > > > > > > > > > > > > > > > > > >