Re: Insert will generate at least one file each time when each spark or spark streaming batch?

Vinoth Chandar Wed, 27 Feb 2019 11:24:46 -0800

Similarly, please try the 0.4.5 release. This has small file handling
turned on by default..


Also please use the insert api/operation, (not bulk_insert) if you want
this behavior.

Let us know if you still run into issues..

On Tue, Feb 26, 2019 at 11:09 PM kaka chen <kaka11.c...@gmail.com> wrote:

> Thanks!
>
> nishith agarwal <n3.nas...@gmail.com> 于2019年2月27日周三 下午2:56写道：
>
> > Hi Kaka,
> >
> > Hudi automatically does file sizing for you. As you ingest more inserts
> the
> > existing file will be automatically sized. You can play with a few
> configs
> > :
> >
> > https://hudi.apache.org/configurations.html#withStorageConfig -> This
> > config allows you to set a max size for your output file.
> > https://hudi.apache.org/configurations.html#compactionSmallFileSize ->
> > This
> > config allows you to set a minimum file size that will be automatically
> > sized.
> >
> > As you can guess, the limitFileSize >= compactionFileSize.
> > Hope this helps.
> >
> > Thanks,
> > Nishith
> >
> > On Tue, Feb 26, 2019 at 6:52 PM kaka chen <kaka11.c...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I found Insert will generate at least one file each time when each
> spark
> > or
> > > spark streaming batch.
> > > Is it expected result? If it is, how to control these small files, is
> > hudi
> > > provide some tools to compact it?
> > >
> > > Thanks,
> > > Frank
> > >
> >
>

Re: Insert will generate at least one file each time when each spark or spark streaming batch?

Reply via email to