Hi all,

Thanks Adam! You're spot on!

I wanted to separate the discussions about this functionality and the
related async & reliable tasks proposal.

The functionality of this one is generally intended for long(er)
running operations against object stores, and already provides the
necessary functionality to fix the existing OOM issue.

Robert

[1] https://lists.apache.org/thread/kqm0w38p7bnojq455yz7d2vdfp6ky1h7

On Fri, Dec 19, 2025 at 3:43 PM Adam Christian
<[email protected]> wrote:
>
> Howdy Robert,
>
> I reviewed the PR and it is very clean. I really enjoy the simplicity of
> the FileOperations interface. I think that's going to be a great extension
> point.
>
> One of the things I was wondering about was what to do with the current
> purge implementation. I understand that it has a high likelihood of having
> an Out of Memory exception [1]. I'm wondering if your idea was to build
> this, then replace the current implementation. I'd love to ensure that we
> have a plan for one clean, stable implementation.
>
> [1] - https://github.com/apache/polaris/issues/2365
>
> Go community,
>
> Adam
>
> On Tue, Dec 16, 2025 at 10:25 AM Adam Christian <
> [email protected]> wrote:
>
> > Hi Yufei,
> >
> > Great questions!
> >
> > From what I can see in the PR, here are the answers to your questions:
> > 1. The first major scenario is improving the memory concerns with purge.
> > That's important to stabilize a core use case in the service.
> > 2. These are related specifically to file operations. I cannot see a way
> > that it would be broader than that.
> >
> > Go community,
> >
> > Adam
> >
> > On Mon, Dec 15, 2025, 3:20 PM Yufei Gu <[email protected]> wrote:
> >
> >> Hi Robert,
> >>
> >> Thanks for sharing the proposal and the PR. Before diving deeper into the
> >> API shape, I was hoping to better understand the intended use cases you
> >> have in mind:
> >>
> >> 1. What concrete scenarios are you primarily targeting with these
> >> long-running object store operations?
> >> 2. Are these mostly expected to be file/object-level maintenance tasks
> >> (e.g. purge, cleanup), or do you envision broader categories of operations
> >> leveraging the same abstraction?
> >>
> >> Having a clearer picture of the motivating use cases would help evaluate
> >> the right level of abstraction and where this should live architecturally.
> >>
> >> Looking forward to the discussion.
> >>
> >> Yufei
> >>
> >>
> >> On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp <[email protected]> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I'd like to propose an API and corresponding implementation for (long
> >> > running) object store operations.
> >> >
> >> > It provides a CPU and heap-friendly API and implementation to work
> >> > against object stores. It is built in a way to provide "pluggable"
> >> > functionality. What I mean is this (Java pseudo code):
> >> > ---
> >> > FileOperations fileOps =
> >> > fileOperationsFactory.createFileOperations(fileIoInstance);
> >> > Stream<FileSpec> allIcebergTableFiles = fileOps.
> >> >     identifyIcebergTableFiles(metadataLocation);
> >> > PurgeStats purged = fileOps.purge(allIcebergTableFiles);
> >> > // or simpler:
> >> > PurgeStats purged = fileOps.purgeIcebergTable(metadataLocation);
> >> > // or similarly for Iceberg views
> >> > PurgeStats purged = fileOps.purgeIcebergView(metadataLocation);
> >> > // or to purge all files underneath a prefix
> >> > PurgeStats purged = fileOps.purge(fileOps.findFiles(prefix));
> >> > ---
> >> >
> >> > Not mentioned in the pseudo code is the ability to rate-limit the
> >> > number of purged files or batch-deletions and configure the deletion
> >> > batch-size.
> >> >
> >> > The PR already contains tests against an on-heap object store mock and
> >> > integration tests against S3/GCS/Azure emulators.
> >> >
> >> > More details can be found in the README [2] included in the PR and of
> >> > course in the code in the PR.
> >> >
> >> > Robert
> >> >
> >> > [1] https://github.com/apache/polaris/pull/3256
> >> > [2]
> >> >
> >> https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md
> >> >
> >>
> >

Reply via email to