Nishith,

Thanks, will try it.

Thanks,
Frank

nishith agarwal <n3.nas...@gmail.com> 于2019年3月12日周二 上午11:21写道:

> Frank,
>
> You can play with a couple of configs to keep X number of older file
> versions. Take a look at these configs :
> https://hudi.apache.org/configurations.html#withCompactionConfig.
> Specifically, you can choose the number of commits you want to keep, here
> commits = versions.
> Depending on how long your query runs, you might want to keep the older
> data file for a configured amount of time after which it will be cleaned.
>
> Thanks,
> Nishith
>
> On Mon, Mar 11, 2019 at 7:42 PM kaka chen <kaka11.c...@gmail.com> wrote:
>
> > Hi Vinoth,
> >
> > To use this feature, I find the new file will write a new file with old
> > inserted records.
> > But how to cleanup the old files when use cow tables?
> >
> > Thanks,
> > Frank
> >
> > Vinoth Chandar <vin...@apache.org> 于2019年2月28日周四 上午3:24写道:
> >
> > > Similarly, please try the 0.4.5 release. This has small file handling
> > > turned on by default..
> > >
> > > Also please use the insert api/operation, (not bulk_insert) if you want
> > > this behavior.
> > >
> > > Let us know if you still run into issues..
> > >
> > > On Tue, Feb 26, 2019 at 11:09 PM kaka chen <kaka11.c...@gmail.com>
> > wrote:
> > >
> > > > Thanks!
> > > >
> > > > nishith agarwal <n3.nas...@gmail.com> 于2019年2月27日周三 下午2:56写道:
> > > >
> > > > > Hi Kaka,
> > > > >
> > > > > Hudi automatically does file sizing for you. As you ingest more
> > inserts
> > > > the
> > > > > existing file will be automatically sized. You can play with a few
> > > > configs
> > > > > :
> > > > >
> > > > > https://hudi.apache.org/configurations.html#withStorageConfig ->
> > This
> > > > > config allows you to set a max size for your output file.
> > > > >
> https://hudi.apache.org/configurations.html#compactionSmallFileSize
> > ->
> > > > > This
> > > > > config allows you to set a minimum file size that will be
> > automatically
> > > > > sized.
> > > > >
> > > > > As you can guess, the limitFileSize >= compactionFileSize.
> > > > > Hope this helps.
> > > > >
> > > > > Thanks,
> > > > > Nishith
> > > > >
> > > > > On Tue, Feb 26, 2019 at 6:52 PM kaka chen <kaka11.c...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I found Insert will generate at least one file each time when
> each
> > > > spark
> > > > > or
> > > > > > spark streaming batch.
> > > > > > Is it expected result? If it is, how to control these small
> files,
> > is
> > > > > hudi
> > > > > > provide some tools to compact it?
> > > > > >
> > > > > > Thanks,
> > > > > > Frank
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to