- How do you plan to build the partition counts? Will it require opening
data manifests as the manifest list only contains partition bounds?
- Today we stream through data files and emit scan tasks. Will you be able
to preserve this behavior?
- Do you have any JMH benchmarks to validate your idea?

- Anton


On Tue, Jan 13, 2026 at 8:08 PM Jian Chen <[email protected]> wrote:

> Dear community,
>
> I would like to start a discussion around a potential improvement to
> planning-time memory usage for large tables with a high volume of delete
> files.
>
> When planning queries on large tables, especially delete-heavy tables, the
> planner currently keeps all delete file metadata in memory for the entire
> planning phase. For tables with many partitions and a large number of
> delete files, this can significantly increase memory pressure and, in
> extreme cases, lead to OOM issues during planning.
>
> *Proposal*
>
> The core idea is to allow delete file metadata to be released
> incrementally during planning, instead of being retained until the end.
>
> I've sent the pr shows how it looks like
> https://github.com/apache/iceberg/pull/14558
>
> Concretely, the proposal is to make ManifestGroup closeable so it can
> proactively release memory once it is no longer needed. The release logic
> is based on *partition reference counting:*
>
>
>    -
>
>    At the beginning of planning, we track the reference count of
>    partitions across all data manifests.
>    -
>
>    As each data manifest finishes planning, the reference count for its
>    associated partitions is decremented.
>    -
>
>    Once a partition is no longer referenced by any remaining data files,
>    its related delete files are no longer needed for planning.
>    -
>
>    At that point, we use the partition value to remove and release the
>    corresponding entries from DeleteFileIndex.
>
> *Discussion*
>
> I would appreciate feedback on:
>
>
>    -
>
>    Whether this approach aligns with Iceberg’s planning model and
>    lifecycle expectations?
>    -
>
>    Any edge cases or correctness concerns you foresee?
>
>

Reply via email to