Dear community,

I would like to start a discussion around a potential improvement to
planning-time memory usage for large tables with a high volume of delete
files.

When planning queries on large tables, especially delete-heavy tables, the
planner currently keeps all delete file metadata in memory for the entire
planning phase. For tables with many partitions and a large number of
delete files, this can significantly increase memory pressure and, in
extreme cases, lead to OOM issues during planning.

*Proposal*

The core idea is to allow delete file metadata to be released incrementally
during planning, instead of being retained until the end.

I've sent the pr shows how it looks like
https://github.com/apache/iceberg/pull/14558

Concretely, the proposal is to make ManifestGroup closeable so it can
proactively release memory once it is no longer needed. The release logic
is based on *partition reference counting:*


   -

   At the beginning of planning, we track the reference count of partitions
   across all data manifests.
   -

   As each data manifest finishes planning, the reference count for its
   associated partitions is decremented.
   -

   Once a partition is no longer referenced by any remaining data files,
   its related delete files are no longer needed for planning.
   -

   At that point, we use the partition value to remove and release the
   corresponding entries from DeleteFileIndex.

*Discussion*

I would appreciate feedback on:


   -

   Whether this approach aligns with Iceberg’s planning model and lifecycle
   expectations?
   -

   Any edge cases or correctness concerns you foresee?

Reply via email to