We already do have non-distributed versions for a bunch of the functionality in core (that's what the actions were based on) so I don't think this is a wild idea. Which ones would you like to add implementations for?
On Tue, Feb 24, 2026 at 9:23 AM Maximilian Michels <[email protected]> wrote: > Hi everyone, > > I've been looking at the Iceberg Actions [1] and noticed many of them > don't fundamentally require a distributed engine. > > Apart from RewriteDataFiles, most of the maintenance tasks are rather > lightweight in the processing department. Some of them could probably run > faster and with fewer resources locally, backed by a thread pool. > > I wonder whether Iceberg could benefit from a local implementation for > ActionsProvider [2]. We have a lot of the building blocks for these already > available in the core. > > Granted, there are scalability limitations for large tables. Also, it's > often more convenient to use existing (distributed) compute infrastructure. > Yet, there are use cases where distributed computing isn't strictly > required. For example: > > - CLI tooling > - CI/CD pipelines and automation scripts > - REST catalog backends which want to run maintenance internally > - Small tables in general > - Environments where Flink/Spark are not available > > I'm curious to hear your thoughts. > > Cheers, > Max > > [1] > https://github.com/apache/iceberg/tree/501824f0c0032b3225b0fe52b904756f0fe5c589/api/src/main/java/org/apache/iceberg/actions > [2] > https://github.com/apache/iceberg/blob/501824f0c0032b3225b0fe52b904756f0fe5c589/api/src/main/java/org/apache/iceberg/actions/ActionsProvider.java#L24 >
