Howdy Robert, I reviewed the PR and it is very clean. I really enjoy the simplicity of the FileOperations interface. I think that's going to be a great extension point.
One of the things I was wondering about was what to do with the current purge implementation. I understand that it has a high likelihood of having an Out of Memory exception [1]. I'm wondering if your idea was to build this, then replace the current implementation. I'd love to ensure that we have a plan for one clean, stable implementation. [1] - https://github.com/apache/polaris/issues/2365 Go community, Adam On Tue, Dec 16, 2025 at 10:25 AM Adam Christian < [email protected]> wrote: > Hi Yufei, > > Great questions! > > From what I can see in the PR, here are the answers to your questions: > 1. The first major scenario is improving the memory concerns with purge. > That's important to stabilize a core use case in the service. > 2. These are related specifically to file operations. I cannot see a way > that it would be broader than that. > > Go community, > > Adam > > On Mon, Dec 15, 2025, 3:20 PM Yufei Gu <[email protected]> wrote: > >> Hi Robert, >> >> Thanks for sharing the proposal and the PR. Before diving deeper into the >> API shape, I was hoping to better understand the intended use cases you >> have in mind: >> >> 1. What concrete scenarios are you primarily targeting with these >> long-running object store operations? >> 2. Are these mostly expected to be file/object-level maintenance tasks >> (e.g. purge, cleanup), or do you envision broader categories of operations >> leveraging the same abstraction? >> >> Having a clearer picture of the motivating use cases would help evaluate >> the right level of abstraction and where this should live architecturally. >> >> Looking forward to the discussion. >> >> Yufei >> >> >> On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp <[email protected]> wrote: >> >> > Hi all, >> > >> > I'd like to propose an API and corresponding implementation for (long >> > running) object store operations. >> > >> > It provides a CPU and heap-friendly API and implementation to work >> > against object stores. It is built in a way to provide "pluggable" >> > functionality. What I mean is this (Java pseudo code): >> > --- >> > FileOperations fileOps = >> > fileOperationsFactory.createFileOperations(fileIoInstance); >> > Stream<FileSpec> allIcebergTableFiles = fileOps. >> > identifyIcebergTableFiles(metadataLocation); >> > PurgeStats purged = fileOps.purge(allIcebergTableFiles); >> > // or simpler: >> > PurgeStats purged = fileOps.purgeIcebergTable(metadataLocation); >> > // or similarly for Iceberg views >> > PurgeStats purged = fileOps.purgeIcebergView(metadataLocation); >> > // or to purge all files underneath a prefix >> > PurgeStats purged = fileOps.purge(fileOps.findFiles(prefix)); >> > --- >> > >> > Not mentioned in the pseudo code is the ability to rate-limit the >> > number of purged files or batch-deletions and configure the deletion >> > batch-size. >> > >> > The PR already contains tests against an on-heap object store mock and >> > integration tests against S3/GCS/Azure emulators. >> > >> > More details can be found in the README [2] included in the PR and of >> > course in the code in the PR. >> > >> > Robert >> > >> > [1] https://github.com/apache/polaris/pull/3256 >> > [2] >> > >> https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md >> > >> >
