Hi Jian, Thank you for your recent contribution. To help better understand the proposed solution, would it be possible to include some additional test cases? It would also be very helpful if you could provide a brief write-up with an example illustrating how scan planning worked previously and how it behaves with the new approach. Also I think the memory usage snapshot is missing.
This context will make it easier for others to review and provide feedback. Thanks so much for your efforts! Best regards, Vaibhav On Wed, Jan 14, 2026 at 10:40 AM Jian Chen <[email protected]> wrote: > > Hi Anton, > > Thanks for quick reply. > > - How do you plan to build the partition counts? Will it require opening data > manifests as the manifest list only contains partition bounds? > we need to open data manifests to get the partition infos when building the > ManifestGroup, I don't see a good/easy way to list all partitions in Iceberg > today, do you know about it? > > - Today we stream through data files and emit scan tasks. Will you be able to > preserve this behavior? > Yes, still the same, just add on a additional close action when finished > planning for each of the data manifest. > > - Do you have any JMH benchmarks to validate your idea? > I don't have the JMH benchmarks for the performance. but did a profile with > memory allocation: > Tested using 3 partitions with large amounts of data and with 10k+ deletes > files + Trino 477 + iceberg 1.10, below is the memory usage snapshot attached. > > > Anton Okolnychyi <[email protected]> 于2026年1月14日周三 12:26写道: >> >> - How do you plan to build the partition counts? Will it require opening >> data manifests as the manifest list only contains partition bounds? >> - Today we stream through data files and emit scan tasks. Will you be able >> to preserve this behavior? >> - Do you have any JMH benchmarks to validate your idea? >> >> - Anton >> >> >> On Tue, Jan 13, 2026 at 8:08 PM Jian Chen <[email protected]> wrote: >>> >>> Dear community, >>> >>> I would like to start a discussion around a potential improvement to >>> planning-time memory usage for large tables with a high volume of delete >>> files. >>> >>> When planning queries on large tables, especially delete-heavy tables, the >>> planner currently keeps all delete file metadata in memory for the entire >>> planning phase. For tables with many partitions and a large number of >>> delete files, this can significantly increase memory pressure and, in >>> extreme cases, lead to OOM issues during planning. >>> >>> Proposal >>> >>> The core idea is to allow delete file metadata to be released incrementally >>> during planning, instead of being retained until the end. >>> >>> I've sent the pr shows how it looks like >>> https://github.com/apache/iceberg/pull/14558 >>> >>> Concretely, the proposal is to make ManifestGroup closeable so it can >>> proactively release memory once it is no longer needed. The release logic >>> is based on partition reference counting: >>> >>> At the beginning of planning, we track the reference count of partitions >>> across all data manifests. >>> >>> As each data manifest finishes planning, the reference count for its >>> associated partitions is decremented. >>> >>> Once a partition is no longer referenced by any remaining data files, its >>> related delete files are no longer needed for planning. >>> >>> At that point, we use the partition value to remove and release the >>> corresponding entries from DeleteFileIndex. >>> >>> Discussion >>> >>> I would appreciate feedback on: >>> >>> Whether this approach aligns with Iceberg’s planning model and lifecycle >>> expectations? >>> >>> Any edge cases or correctness concerns you foresee?
