I'm not sure we can write a stable version for all of those, DeleteOrphans
and RewriteDataFiles are the big ones for me which I think would break down
pretty quickly at scale.

My other concern is that I generally don't like when there are two ways to
do something. On thought here is, do we get far enough if we just have
Spark in Local mode for our already existing options? That wouldn't require
a new structure but users would need Spark on the classpath.

On Thu, Feb 26, 2026 at 10:02 AM Maximilian Michels <[email protected]> wrote:

> Hey Russell,
>
> I agree, Table API already has ExpireSnapshots and RewriteManifests.
> In that case, the wrappers add two things on top:
>
> 1. Result reporting with actual delete counts across the different
> file types. The current table API doesn't return a result object.
> 2. Consistent API: ActionsProvider would aggregate all available local
> actions in one place for consumers like (CLI tools, testing, etc.).
>
> The more interesting actions are the ones without Table API
> equivalents: DeleteOrphanFiles, RewriteTablePath, RewriteDataFiles.
>
> I think it would be useful to be able to run all actions without Spark
> dependencies. What do you think?
>
> Cheers,
> Max
>
>
> On Wed, Feb 25, 2026 at 8:43 PM Russell Spitzer
> <[email protected]> wrote:
> >
> > So for those first two they already exist in our Table.java API
> >
> > table.expireSnapshots()
> >      .expireOlderThan(tsToExpire)
> >      .commit();
> >
> > table.rewriteManifests()
> >      .commit();
> >
> > Only RewriteTablePath doesn't have a local version yet but I think we
> could possibly add that
> >
> > What were you thinking of adding to the existing apis?
> >
> > On Wed, Feb 25, 2026 at 2:17 AM Maximilian Michels <[email protected]>
> wrote:
> >>
> >> Hi Russell,
> >>
> >> Exactly, for many actions this is mostly plumbing to make the existing
> >> functionality available.
> >>
> >> >Which ones would you like to add implementations for?
> >>
> >> We can start with some simple ones, e.g. ExpireSnapshots,
> >> RewriteManifests, RewriteTablePath.
> >>
> >> -Max
> >>
> >>
> >> On Tue, Feb 24, 2026 at 5:03 PM Russell Spitzer
> >> <[email protected]> wrote:
> >> >
> >> > We already do have non-distributed versions for a bunch of the
> functionality in core (that's what the actions were based on) so I don't
> think this is a wild idea. Which ones would you like to add implementations
> for?
> >> >
> >> > On Tue, Feb 24, 2026 at 9:23 AM Maximilian Michels <[email protected]>
> wrote:
> >> >>
> >> >> Hi everyone,
> >> >>
> >> >> I've been looking at the Iceberg Actions [1] and noticed many of
> them don't fundamentally require a distributed engine.
> >> >>
> >> >> Apart from RewriteDataFiles, most of the maintenance tasks are
> rather lightweight in the processing department. Some of them could
> probably run faster and with fewer resources locally, backed by a thread
> pool.
> >> >>
> >> >> I wonder whether Iceberg could benefit from a local implementation
> for ActionsProvider [2]. We have a lot of the building blocks for these
> already available in the core.
> >> >>
> >> >> Granted, there are scalability limitations for large tables. Also,
> it's often more convenient to use existing (distributed) compute
> infrastructure. Yet, there are use cases where distributed computing isn't
> strictly required. For example:
> >> >>
> >> >>   - CLI tooling
> >> >>   - CI/CD pipelines and automation scripts
> >> >>   - REST catalog backends which want to run maintenance internally
> >> >>   - Small tables in general
> >> >>   - Environments where Flink/Spark are not available
> >> >>
> >> >> I'm curious to hear your thoughts.
> >> >>
> >> >> Cheers,
> >> >> Max
> >> >>
> >> >> [1]
> https://github.com/apache/iceberg/tree/501824f0c0032b3225b0fe52b904756f0fe5c589/api/src/main/java/org/apache/iceberg/actions
> >> >> [2]
> https://github.com/apache/iceberg/blob/501824f0c0032b3225b0fe52b904756f0fe5c589/api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java#L24
>

Reply via email to