I think this matches a lot of the original non-spark reading and planning code in the Iceberg project, the only big difference I can see is the handling of manifests themselves which were streamed in the Iceberg version originally (see ManifestGroup for context and nightmares). I can see that we are worried about leaving open file handles here so that is an ok tradeoff if we want to make it.
On Thu, Jan 8, 2026 at 4:04 AM Pierre Laporte <[email protected]> wrote: > As far as I can tell, here is the space complexity for each method. The > names used correspond to: > > * PM = number of previous metadata files > * S = number of snapshots > * ST = number of statistics files > * PST = number of partition statistics files > * UM = number of unique manifest files across all snapshots > * T = total number of created TaskEntities > > The getMetadataFileBatches method has a space complexity of `O(PM + S + ST > + PST)`. > Same thing for the getMetadataTaskStream method. > The getManifestTaskStream method has a space complexity of `O(UM)`. > The handleTask method has a space complexity of `O(UM + PM + S + ST + PST + > T)` > > Based on those elements, it is clear that the current implementation will > run into heap pressure for tables with many snapshots and frequent updates, > or tables with long metadata history. > > On Wed, Jan 7, 2026 at 9:55 PM Yufei Gu <[email protected]> wrote: > > > Hi all, > > > > After taking a closer look, I am not sure the issue as currently > described > > is actually valid. > > > > The base64 encoded manifest objects[1] being discussed are not the > manifest > > files themselves. They are objects representing manifest files, which can > > be reconstructed from the manifest entries stored in the ManifestList > > files. As a result, the in memory footprint should be roughly equivalent > to > > the size of a single manifest list file per snapshot, plus some > additional > > base64 encoding overhead. That overhead does not seem significant enough > on > > its own to explain large heap pressure. > > > > This pattern is also handled in practice today. For example, multiple > Spark > > procedures/actions and Spark planning process these manifest > > representations within a single node, typically the driver, without > > materializing full manifest files in memory. One concrete example is > here: > > > > > https://github.com/apache/iceberg/blob/bf1074ff373c929614a3f92dd4e46780028ac1ca/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java#L290 > > > > Given this, I am not convinced that embedding these manifest > > representations is inherently problematic from a memory perspective. If > > there are concrete scenarios where this still leads to excessive memory > > usage, it would be helpful to clarify where the amplification happens > > beyond what is already expected from manifest list processing. > > > > Happy to be corrected if I am missing something, but wanted to share this > > observation before we anchor further design decisions on this assumption. > > > > 1. > > > > > https://github.com/apache/polaris/blob/c9efc6c1af202686945efe2e19125e8f116a0206/runtime/service/src/main/java/org/apache/polaris/service/task/TableCleanupTaskHandler.java#L194 > > > > Yufei > > > > > > On Tue, Jan 6, 2026 at 8:46 PM Yufei Gu <[email protected]> wrote: > > > > > Thanks Adam and Robert for the replies. > > > > > > Just to make sure I am understanding this correctly. > > > > > > The core issue the proposal is addressing is described in > > > https://github.com/apache/polaris/issues/2365 that, today, the full > > > binary Iceberg manifest files, base64 encoded, are embedded in the task > > > entities. As a result, when a purge runs, all manifests for a table end > > up > > > being materialized in memory at once. This behavior creates significant > > > heap pressure and may lead to out of memory failures during purge > > > operations. > > > > > > Please let me know if this matches your intent, or if I am missing > > > anything. > > > > > > Yufei > > > > > > > > > On Sat, Dec 20, 2025 at 4:53 AM Robert Stupp <[email protected]> wrote: > > > > > >> Hi all, > > >> > > >> Thanks Adam! You're spot on! > > >> > > >> I wanted to separate the discussions about this functionality and the > > >> related async & reliable tasks proposal. > > >> > > >> The functionality of this one is generally intended for long(er) > > >> running operations against object stores, and already provides the > > >> necessary functionality to fix the existing OOM issue. > > >> > > >> Robert > > >> > > >> [1] https://lists.apache.org/thread/kqm0w38p7bnojq455yz7d2vdfp6ky1h7 > > >> > > >> On Fri, Dec 19, 2025 at 3:43 PM Adam Christian > > >> <[email protected]> wrote: > > >> > > > >> > Howdy Robert, > > >> > > > >> > I reviewed the PR and it is very clean. I really enjoy the > simplicity > > of > > >> > the FileOperations interface. I think that's going to be a great > > >> extension > > >> > point. > > >> > > > >> > One of the things I was wondering about was what to do with the > > current > > >> > purge implementation. I understand that it has a high likelihood of > > >> having > > >> > an Out of Memory exception [1]. I'm wondering if your idea was to > > build > > >> > this, then replace the current implementation. I'd love to ensure > that > > >> we > > >> > have a plan for one clean, stable implementation. > > >> > > > >> > [1] - https://github.com/apache/polaris/issues/2365 > > >> > > > >> > Go community, > > >> > > > >> > Adam > > >> > > > >> > On Tue, Dec 16, 2025 at 10:25 AM Adam Christian < > > >> > [email protected]> wrote: > > >> > > > >> > > Hi Yufei, > > >> > > > > >> > > Great questions! > > >> > > > > >> > > From what I can see in the PR, here are the answers to your > > questions: > > >> > > 1. The first major scenario is improving the memory concerns with > > >> purge. > > >> > > That's important to stabilize a core use case in the service. > > >> > > 2. These are related specifically to file operations. I cannot > see a > > >> way > > >> > > that it would be broader than that. > > >> > > > > >> > > Go community, > > >> > > > > >> > > Adam > > >> > > > > >> > > On Mon, Dec 15, 2025, 3:20 PM Yufei Gu <[email protected]> > > wrote: > > >> > > > > >> > >> Hi Robert, > > >> > >> > > >> > >> Thanks for sharing the proposal and the PR. Before diving deeper > > >> into the > > >> > >> API shape, I was hoping to better understand the intended use > cases > > >> you > > >> > >> have in mind: > > >> > >> > > >> > >> 1. What concrete scenarios are you primarily targeting with these > > >> > >> long-running object store operations? > > >> > >> 2. Are these mostly expected to be file/object-level maintenance > > >> tasks > > >> > >> (e.g. purge, cleanup), or do you envision broader categories of > > >> operations > > >> > >> leveraging the same abstraction? > > >> > >> > > >> > >> Having a clearer picture of the motivating use cases would help > > >> evaluate > > >> > >> the right level of abstraction and where this should live > > >> architecturally. > > >> > >> > > >> > >> Looking forward to the discussion. > > >> > >> > > >> > >> Yufei > > >> > >> > > >> > >> > > >> > >> On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp <[email protected]> > > wrote: > > >> > >> > > >> > >> > Hi all, > > >> > >> > > > >> > >> > I'd like to propose an API and corresponding implementation for > > >> (long > > >> > >> > running) object store operations. > > >> > >> > > > >> > >> > It provides a CPU and heap-friendly API and implementation to > > work > > >> > >> > against object stores. It is built in a way to provide > > "pluggable" > > >> > >> > functionality. What I mean is this (Java pseudo code): > > >> > >> > --- > > >> > >> > FileOperations fileOps = > > >> > >> > fileOperationsFactory.createFileOperations(fileIoInstance); > > >> > >> > Stream<FileSpec> allIcebergTableFiles = fileOps. > > >> > >> > identifyIcebergTableFiles(metadataLocation); > > >> > >> > PurgeStats purged = fileOps.purge(allIcebergTableFiles); > > >> > >> > // or simpler: > > >> > >> > PurgeStats purged = > fileOps.purgeIcebergTable(metadataLocation); > > >> > >> > // or similarly for Iceberg views > > >> > >> > PurgeStats purged = fileOps.purgeIcebergView(metadataLocation); > > >> > >> > // or to purge all files underneath a prefix > > >> > >> > PurgeStats purged = fileOps.purge(fileOps.findFiles(prefix)); > > >> > >> > --- > > >> > >> > > > >> > >> > Not mentioned in the pseudo code is the ability to rate-limit > the > > >> > >> > number of purged files or batch-deletions and configure the > > >> deletion > > >> > >> > batch-size. > > >> > >> > > > >> > >> > The PR already contains tests against an on-heap object store > > mock > > >> and > > >> > >> > integration tests against S3/GCS/Azure emulators. > > >> > >> > > > >> > >> > More details can be found in the README [2] included in the PR > > and > > >> of > > >> > >> > course in the code in the PR. > > >> > >> > > > >> > >> > Robert > > >> > >> > > > >> > >> > [1] https://github.com/apache/polaris/pull/3256 > > >> > >> > [2] > > >> > >> > > > >> > >> > > >> > > > https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md > > >> > >> > > > >> > >> > > >> > > > > >> > > > > > >
