Cool. So, cleaning policy determines how we clean up older versions of file
groups (simplistically old parquet and log files), to bound storage growth,

KEEP_LATEST_COMMITS (default) : Retains (does not delete) any file (slice)
that was touched in the last X commits. The idea here is that you are able
to pull the incremental changes worth upto X commits.
KEEP_LATEST_FILE_VERSIONS :  If you are not interested in incremental pull
at all, you can choose to just retain X files (slices) per file group (i.e
files that share same prefix) instead. This could result in fewer files in
some cases.

In practice, we always use KEEP_LATEST_COMMITS, I keep thinking about
starting a discussion to retire LATEST_FILE_VERSIONS actually..

Hope that helps.

On Tue, Jun 11, 2019 at 9:05 AM Gary Li <[email protected]> wrote:

> Hello Vinoth,
>
> Yes, that’s what I mean.
>
> Thanks
> Gary
>
> On Tue, Jun 11, 2019 at 9:03 AM Vinoth Chandar <[email protected]> wrote:
>
> > Hi Gary,
> >
> > Do  you mean cleaning policy?  KEEP_LATEST_FILE_VERSIONS vs
> >  KEEP_LATEST_COMMITS ?
> >
> > Thanks
> > VInoth
> >
> > On Mon, Jun 10, 2019 at 9:47 PM Gary Li <[email protected]>
> wrote:
> >
> > > Hello,
> > >
> > > I am a little confused when I was looking at the compaction policy.
> What
> > is
> > > the difference between KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION? What
> is
> > > the exact definition of "COMMIT" and "VERSION"?
> > >
> > > Thanks,
> > > Gary
> > >
> >
>

Reply via email to