Cool. So, cleaning policy determines how we clean up older versions of file groups (simplistically old parquet and log files), to bound storage growth,
KEEP_LATEST_COMMITS (default) : Retains (does not delete) any file (slice) that was touched in the last X commits. The idea here is that you are able to pull the incremental changes worth upto X commits. KEEP_LATEST_FILE_VERSIONS : If you are not interested in incremental pull at all, you can choose to just retain X files (slices) per file group (i.e files that share same prefix) instead. This could result in fewer files in some cases. In practice, we always use KEEP_LATEST_COMMITS, I keep thinking about starting a discussion to retire LATEST_FILE_VERSIONS actually.. Hope that helps. On Tue, Jun 11, 2019 at 9:05 AM Gary Li <[email protected]> wrote: > Hello Vinoth, > > Yes, that’s what I mean. > > Thanks > Gary > > On Tue, Jun 11, 2019 at 9:03 AM Vinoth Chandar <[email protected]> wrote: > > > Hi Gary, > > > > Do you mean cleaning policy? KEEP_LATEST_FILE_VERSIONS vs > > KEEP_LATEST_COMMITS ? > > > > Thanks > > VInoth > > > > On Mon, Jun 10, 2019 at 9:47 PM Gary Li <[email protected]> > wrote: > > > > > Hello, > > > > > > I am a little confused when I was looking at the compaction policy. > What > > is > > > the difference between KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION? What > is > > > the exact definition of "COMMIT" and "VERSION"? > > > > > > Thanks, > > > Gary > > > > > >
