Howdy Robert,

I reviewed the PR and it is very clean. I really enjoy the simplicity of
the FileOperations interface. I think that's going to be a great extension
point.

One of the things I was wondering about was what to do with the current
purge implementation. I understand that it has a high likelihood of having
an Out of Memory exception [1]. I'm wondering if your idea was to build
this, then replace the current implementation. I'd love to ensure that we
have a plan for one clean, stable implementation.

[1] - https://github.com/apache/polaris/issues/2365

Go community,

Adam

On Tue, Dec 16, 2025 at 10:25 AM Adam Christian <
[email protected]> wrote:

> Hi Yufei,
>
> Great questions!
>
> From what I can see in the PR, here are the answers to your questions:
> 1. The first major scenario is improving the memory concerns with purge.
> That's important to stabilize a core use case in the service.
> 2. These are related specifically to file operations. I cannot see a way
> that it would be broader than that.
>
> Go community,
>
> Adam
>
> On Mon, Dec 15, 2025, 3:20 PM Yufei Gu <[email protected]> wrote:
>
>> Hi Robert,
>>
>> Thanks for sharing the proposal and the PR. Before diving deeper into the
>> API shape, I was hoping to better understand the intended use cases you
>> have in mind:
>>
>> 1. What concrete scenarios are you primarily targeting with these
>> long-running object store operations?
>> 2. Are these mostly expected to be file/object-level maintenance tasks
>> (e.g. purge, cleanup), or do you envision broader categories of operations
>> leveraging the same abstraction?
>>
>> Having a clearer picture of the motivating use cases would help evaluate
>> the right level of abstraction and where this should live architecturally.
>>
>> Looking forward to the discussion.
>>
>> Yufei
>>
>>
>> On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp <[email protected]> wrote:
>>
>> > Hi all,
>> >
>> > I'd like to propose an API and corresponding implementation for (long
>> > running) object store operations.
>> >
>> > It provides a CPU and heap-friendly API and implementation to work
>> > against object stores. It is built in a way to provide "pluggable"
>> > functionality. What I mean is this (Java pseudo code):
>> > ---
>> > FileOperations fileOps =
>> > fileOperationsFactory.createFileOperations(fileIoInstance);
>> > Stream<FileSpec> allIcebergTableFiles = fileOps.
>> >     identifyIcebergTableFiles(metadataLocation);
>> > PurgeStats purged = fileOps.purge(allIcebergTableFiles);
>> > // or simpler:
>> > PurgeStats purged = fileOps.purgeIcebergTable(metadataLocation);
>> > // or similarly for Iceberg views
>> > PurgeStats purged = fileOps.purgeIcebergView(metadataLocation);
>> > // or to purge all files underneath a prefix
>> > PurgeStats purged = fileOps.purge(fileOps.findFiles(prefix));
>> > ---
>> >
>> > Not mentioned in the pseudo code is the ability to rate-limit the
>> > number of purged files or batch-deletions and configure the deletion
>> > batch-size.
>> >
>> > The PR already contains tests against an on-heap object store mock and
>> > integration tests against S3/GCS/Azure emulators.
>> >
>> > More details can be found in the README [2] included in the PR and of
>> > course in the code in the PR.
>> >
>> > Robert
>> >
>> > [1] https://github.com/apache/polaris/pull/3256
>> > [2]
>> >
>> https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md
>> >
>>
>

Reply via email to