> If your position now is [...] My position has not changed. > a simple solution like this, https://github.com/apache/polaris/pull/3415. Have you tested this solution? It seems to me that the space complexity in this code has not changed at all...
-- Pierre On Sun, Jan 11, 2026 at 8:57 PM Yufei Gu <[email protected]> wrote: > > it applies regardless of the final answer to the base64 question. > > Pierre, this is exactly the point where I disagree. The statement that all > manifest files are materialized at once being “the cause of the reported > heap pressure” is a claim that has been made in the issue > https://github.com/apache/polaris/issues/2365. I didn't see any > misunderstanding would be made from that. If your position now is that heap > pressure exists regardless of base64 encoding of all manifest files, due to > the inherent cost of traversing large metadata histories, then that is a > different explanation and the original attribution should be corrected. We > should be explicit about it in the issue and the dev mailing list. Treating > these as equivalent is not accurate. > > BTW, for the second issue claimed, we could have a simple solution like > this, https://github.com/apache/polaris/pull/3415. > > Yufei > > > On Fri, Jan 9, 2026 at 7:02 AM Adam Christian < > [email protected]> wrote: > > > So, I wanted to bring all discussions into this thread as we are jumping > > between the issue filed, this thread, and the GitHub PR. > > > > In regards to Yufei's comment on the PR [1], sorry if I wasn't clear when > > Yufei, Prashant, & I had our conversation. From what I remember, there > were > > two points brought up: > > 1. I wanted to check out Yufei's comment on the memory consumption > > analysis. I did NOT recommend pausing on the work. I did want to do an > > investigation on my side on Yufei's comment on the GitHub issue. > > 2. I did want to see what other options there were for implementation, > but > > those can be done as a follow up. Prashant had mentioned leveraging > > AllEntriesTable [2]. > > > > Let me address these one-by-one. > > > > On the memory pressure analysis: > > 1. I agree with Yufei that the encoding is just on the ManifestFile > > metadata object; not the Manifest file content. That alone should not > > explain significant memory pressure. > > 2. However, this is NOT what I read the ticket to be about since the > ticket > > is titled "Purge table task implementation prone to OOMs" and, as Pierre > > pointed out, there is the line "Although some Java stream handling is > being > > used, all manifest-files of the table to purge are materialized at once > on > > the Java heap." > > 3. I agree with Pierre's analysis of the memory consumption patterns of > > TableCleanupTaskHandler and I believe that our current implementation can > > be improved. > > 4. I believe that Robert's PR improves our current implementation and, > > because it is an incremental way to get us to a more stable product, we > > should get it in. > > 5. I believe that improvements to the current codebase should not be > > hindered by future concerns like the asynchronous task or delegation > > service work as long as it does not block those future decisions. This PR > > is orthogonal to those decisions. This PR helps improve existing > > functionality and can be used however we decide to move forward in the > > future. > > > > On the other options for implementation: > > 1. I agree with Prashant that it would be great to reuse existing > > functionality when it applies to our use cases. I did not know about the > > AllEntriesTable functionality in Iceberg before and it could be fruitful > > for someone to do that analysis in the future. > > 2. That being said, I do NOT believe that we need to wait for someone to > do > > that analysis before moving forward. The beauty of having the separation > > between the File Operations APIs and the implementation modules is that > we > > can change the implementation when we find a better one. > > > > So, personally, I would like to get this work in and start alleviating > the > > community of any potential issues with the current purge table operation. > > > > [1] - > https://github.com/apache/polaris/pull/3256#issuecomment-3725228243 > > [2] - > > > > > https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/AllEntriesTable.java > > > > Go community, > > > > Adam > > > > On Fri, Jan 9, 2026 at 4:33 AM Pierre Laporte <[email protected]> > > wrote: > > > > > Oh I see where the misunderstanding is. At this time, what you are > > > questioning is this sentence from the #2365 original bug report: > > > > > > > The base64 encoded full binary Iceberg manifest files are included in > > the > > > task entities > > > > > > Now, consider the sentence that immediately follows: > > > > > > > Although some Java stream handling is being used, all manifest-files > of > > > the table to purge are materialized at once on the Java heap. > > > > > > This is indeed what my analysis is about, as it is what defines the > > memory > > > consumption pattern of the current logic. Because it is the cause of > the > > > reported heap pressure, it applies regardless of the final answer to > the > > > base64 question. > > > > > > -- > > > > > > Pierre > > > > > > > > > On Fri, Jan 9, 2026 at 5:58 AM Yufei Gu <[email protected]> wrote: > > > > > > > > > > > > > The getMetadataFileBatches method has a space complexity of `O(PM + > > S + > > > > ST > > > > > + PST)`. > > > > > Same thing for the getMetadataTaskStream method. > > > > > The getManifestTaskStream method has a space complexity of `O(UM)`. > > > > > The handleTask method has a space complexity of `O(UM + PM + S + > ST + > > > > PST + > > > > > T)` > > > > > > > > > > > > Pierre, thanks for the analysis. I think this response is addressing > a > > > > slightly different issue than the one originally raised. The initial > > > > concern was that embedding base64 encoded manifest files causes > > excessive > > > > heap usage. The new complexity breakdown instead argues that > operations > > > > scale with table history and activity. If the concern is really about > > the > > > > cost of operating on large tables with long metadata history, that > > seems > > > > like a broader, pre existing characteristic rather than a defect in > > this > > > > specific approach. In that case, it may be clearer to track this as a > > > > separate issue or PR. For reference, there were discussions of moving > > the > > > > heavy workloads like this out of Polaris instance to a delegation > > > service. > > > > > > > > Yufei > > > > > > > > > > > > On Thu, Jan 8, 2026 at 2:04 AM Pierre Laporte <[email protected] > > > > > > wrote: > > > > > > > > > As far as I can tell, here is the space complexity for each method. > > > The > > > > > names used correspond to: > > > > > > > > > > * PM = number of previous metadata files > > > > > * S = number of snapshots > > > > > * ST = number of statistics files > > > > > * PST = number of partition statistics files > > > > > * UM = number of unique manifest files across all snapshots > > > > > * T = total number of created TaskEntities > > > > > > > > > > The getMetadataFileBatches method has a space complexity of `O(PM + > > S + > > > > ST > > > > > + PST)`. > > > > > Same thing for the getMetadataTaskStream method. > > > > > The getManifestTaskStream method has a space complexity of `O(UM)`. > > > > > The handleTask method has a space complexity of `O(UM + PM + S + > ST + > > > > PST + > > > > > T)` > > > > > > > > > > Based on those elements, it is clear that the current > implementation > > > will > > > > > run into heap pressure for tables with many snapshots and frequent > > > > updates, > > > > > or tables with long metadata history. > > > > > > > > > > On Wed, Jan 7, 2026 at 9:55 PM Yufei Gu <[email protected]> > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > After taking a closer look, I am not sure the issue as currently > > > > > described > > > > > > is actually valid. > > > > > > > > > > > > The base64 encoded manifest objects[1] being discussed are not > the > > > > > manifest > > > > > > files themselves. They are objects representing manifest files, > > which > > > > can > > > > > > be reconstructed from the manifest entries stored in the > > ManifestList > > > > > > files. As a result, the in memory footprint should be roughly > > > > equivalent > > > > > to > > > > > > the size of a single manifest list file per snapshot, plus some > > > > > additional > > > > > > base64 encoding overhead. That overhead does not seem significant > > > > enough > > > > > on > > > > > > its own to explain large heap pressure. > > > > > > > > > > > > This pattern is also handled in practice today. For example, > > multiple > > > > > Spark > > > > > > procedures/actions and Spark planning process these manifest > > > > > > representations within a single node, typically the driver, > without > > > > > > materializing full manifest files in memory. One concrete example > > is > > > > > here: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/iceberg/blob/bf1074ff373c929614a3f92dd4e46780028ac1ca/spark/v4.0/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteTablePathSparkAction.java#L290 > > > > > > > > > > > > Given this, I am not convinced that embedding these manifest > > > > > > representations is inherently problematic from a memory > > perspective. > > > If > > > > > > there are concrete scenarios where this still leads to excessive > > > memory > > > > > > usage, it would be helpful to clarify where the amplification > > happens > > > > > > beyond what is already expected from manifest list processing. > > > > > > > > > > > > Happy to be corrected if I am missing something, but wanted to > > share > > > > this > > > > > > observation before we anchor further design decisions on this > > > > assumption. > > > > > > > > > > > > 1. > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/polaris/blob/c9efc6c1af202686945efe2e19125e8f116a0206/runtime/service/src/main/java/org/apache/polaris/service/task/TableCleanupTaskHandler.java#L194 > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > On Tue, Jan 6, 2026 at 8:46 PM Yufei Gu <[email protected]> > > > wrote: > > > > > > > > > > > > > Thanks Adam and Robert for the replies. > > > > > > > > > > > > > > Just to make sure I am understanding this correctly. > > > > > > > > > > > > > > The core issue the proposal is addressing is described in > > > > > > > https://github.com/apache/polaris/issues/2365 that, today, the > > > full > > > > > > > binary Iceberg manifest files, base64 encoded, are embedded in > > the > > > > task > > > > > > > entities. As a result, when a purge runs, all manifests for a > > table > > > > end > > > > > > up > > > > > > > being materialized in memory at once. This behavior creates > > > > significant > > > > > > > heap pressure and may lead to out of memory failures during > purge > > > > > > > operations. > > > > > > > > > > > > > > Please let me know if this matches your intent, or if I am > > missing > > > > > > > anything. > > > > > > > > > > > > > > Yufei > > > > > > > > > > > > > > > > > > > > > On Sat, Dec 20, 2025 at 4:53 AM Robert Stupp <[email protected]> > > > wrote: > > > > > > > > > > > > > >> Hi all, > > > > > > >> > > > > > > >> Thanks Adam! You're spot on! > > > > > > >> > > > > > > >> I wanted to separate the discussions about this functionality > > and > > > > the > > > > > > >> related async & reliable tasks proposal. > > > > > > >> > > > > > > >> The functionality of this one is generally intended for > long(er) > > > > > > >> running operations against object stores, and already provides > > the > > > > > > >> necessary functionality to fix the existing OOM issue. > > > > > > >> > > > > > > >> Robert > > > > > > >> > > > > > > >> [1] > > > > https://lists.apache.org/thread/kqm0w38p7bnojq455yz7d2vdfp6ky1h7 > > > > > > >> > > > > > > >> On Fri, Dec 19, 2025 at 3:43 PM Adam Christian > > > > > > >> <[email protected]> wrote: > > > > > > >> > > > > > > > >> > Howdy Robert, > > > > > > >> > > > > > > > >> > I reviewed the PR and it is very clean. I really enjoy the > > > > > simplicity > > > > > > of > > > > > > >> > the FileOperations interface. I think that's going to be a > > great > > > > > > >> extension > > > > > > >> > point. > > > > > > >> > > > > > > > >> > One of the things I was wondering about was what to do with > > the > > > > > > current > > > > > > >> > purge implementation. I understand that it has a high > > likelihood > > > > of > > > > > > >> having > > > > > > >> > an Out of Memory exception [1]. I'm wondering if your idea > was > > > to > > > > > > build > > > > > > >> > this, then replace the current implementation. I'd love to > > > ensure > > > > > that > > > > > > >> we > > > > > > >> > have a plan for one clean, stable implementation. > > > > > > >> > > > > > > > >> > [1] - https://github.com/apache/polaris/issues/2365 > > > > > > >> > > > > > > > >> > Go community, > > > > > > >> > > > > > > > >> > Adam > > > > > > >> > > > > > > > >> > On Tue, Dec 16, 2025 at 10:25 AM Adam Christian < > > > > > > >> > [email protected]> wrote: > > > > > > >> > > > > > > > >> > > Hi Yufei, > > > > > > >> > > > > > > > > >> > > Great questions! > > > > > > >> > > > > > > > > >> > > From what I can see in the PR, here are the answers to > your > > > > > > questions: > > > > > > >> > > 1. The first major scenario is improving the memory > concerns > > > > with > > > > > > >> purge. > > > > > > >> > > That's important to stabilize a core use case in the > > service. > > > > > > >> > > 2. These are related specifically to file operations. I > > cannot > > > > > see a > > > > > > >> way > > > > > > >> > > that it would be broader than that. > > > > > > >> > > > > > > > > >> > > Go community, > > > > > > >> > > > > > > > > >> > > Adam > > > > > > >> > > > > > > > > >> > > On Mon, Dec 15, 2025, 3:20 PM Yufei Gu < > > [email protected]> > > > > > > wrote: > > > > > > >> > > > > > > > > >> > >> Hi Robert, > > > > > > >> > >> > > > > > > >> > >> Thanks for sharing the proposal and the PR. Before diving > > > > deeper > > > > > > >> into the > > > > > > >> > >> API shape, I was hoping to better understand the intended > > use > > > > > cases > > > > > > >> you > > > > > > >> > >> have in mind: > > > > > > >> > >> > > > > > > >> > >> 1. What concrete scenarios are you primarily targeting > with > > > > these > > > > > > >> > >> long-running object store operations? > > > > > > >> > >> 2. Are these mostly expected to be file/object-level > > > > maintenance > > > > > > >> tasks > > > > > > >> > >> (e.g. purge, cleanup), or do you envision broader > > categories > > > of > > > > > > >> operations > > > > > > >> > >> leveraging the same abstraction? > > > > > > >> > >> > > > > > > >> > >> Having a clearer picture of the motivating use cases > would > > > help > > > > > > >> evaluate > > > > > > >> > >> the right level of abstraction and where this should live > > > > > > >> architecturally. > > > > > > >> > >> > > > > > > >> > >> Looking forward to the discussion. > > > > > > >> > >> > > > > > > >> > >> Yufei > > > > > > >> > >> > > > > > > >> > >> > > > > > > >> > >> On Fri, Dec 12, 2025 at 3:48 AM Robert Stupp < > > [email protected] > > > > > > > > > > wrote: > > > > > > >> > >> > > > > > > >> > >> > Hi all, > > > > > > >> > >> > > > > > > > >> > >> > I'd like to propose an API and corresponding > > implementation > > > > for > > > > > > >> (long > > > > > > >> > >> > running) object store operations. > > > > > > >> > >> > > > > > > > >> > >> > It provides a CPU and heap-friendly API and > > implementation > > > to > > > > > > work > > > > > > >> > >> > against object stores. It is built in a way to provide > > > > > > "pluggable" > > > > > > >> > >> > functionality. What I mean is this (Java pseudo code): > > > > > > >> > >> > --- > > > > > > >> > >> > FileOperations fileOps = > > > > > > >> > >> > > > fileOperationsFactory.createFileOperations(fileIoInstance); > > > > > > >> > >> > Stream<FileSpec> allIcebergTableFiles = fileOps. > > > > > > >> > >> > identifyIcebergTableFiles(metadataLocation); > > > > > > >> > >> > PurgeStats purged = > fileOps.purge(allIcebergTableFiles); > > > > > > >> > >> > // or simpler: > > > > > > >> > >> > PurgeStats purged = > > > > > fileOps.purgeIcebergTable(metadataLocation); > > > > > > >> > >> > // or similarly for Iceberg views > > > > > > >> > >> > PurgeStats purged = > > > > fileOps.purgeIcebergView(metadataLocation); > > > > > > >> > >> > // or to purge all files underneath a prefix > > > > > > >> > >> > PurgeStats purged = > > > fileOps.purge(fileOps.findFiles(prefix)); > > > > > > >> > >> > --- > > > > > > >> > >> > > > > > > > >> > >> > Not mentioned in the pseudo code is the ability to > > > rate-limit > > > > > the > > > > > > >> > >> > number of purged files or batch-deletions and configure > > the > > > > > > >> deletion > > > > > > >> > >> > batch-size. > > > > > > >> > >> > > > > > > > >> > >> > The PR already contains tests against an on-heap object > > > store > > > > > > mock > > > > > > >> and > > > > > > >> > >> > integration tests against S3/GCS/Azure emulators. > > > > > > >> > >> > > > > > > > >> > >> > More details can be found in the README [2] included in > > the > > > > PR > > > > > > and > > > > > > >> of > > > > > > >> > >> > course in the code in the PR. > > > > > > >> > >> > > > > > > > >> > >> > Robert > > > > > > >> > >> > > > > > > > >> > >> > [1] https://github.com/apache/polaris/pull/3256 > > > > > > >> > >> > [2] > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://github.com/snazy/polaris/blob/obj-store-ops/storage/files/README.md > > > > > > >> > >> > > > > > > > >> > >> > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > >
