Re: [DISCUSS] metadata.json in v4?

Anton Okolnychyi Wed, 11 Feb 2026 08:37:59 -0800

It looks like there is enough interest in the community and a few good ways
to make the generation of the root metadata optional with some smarter
catalogs. This would get us to truly single file commits in V4 without
sacrificing portability if catalogs are required to generate the root
metadata file on demand.


What should we do with built-in catalogs? Streaming appends should be
supported out of the box without requiring aggressive table maintenance. It
seems that not writing the root metadata for HMS is going to be a lot of
work. Any thoughts? Should we then pursue offloading the snapshot history
for such catalogs?

вт, 10 лют. 2026 р. о 18:53 Anoop Johnson <[email protected]> пише:

> Agree that snapshot history is the main bloat factor. We've seen fast
> moving tables where writing the metadata.json file takes several
> seconds. For comparison, Delta Lake uses an efficient binary-search based
> time travel that can scale to O(millions) of table versions.
>
> Rather than limiting snapshot retention, we might want to consider adding
> time travel directly to the IRC spec. The catalog could implement scalable
> time travel using appropriate indexing. So GetTable API could accept an
> optional `AS OF` timestamp param and return the table metadata as of that
> timestamp. This would enable catalog implementations to choose their own
> time travel strategy (indexes, bloom filters, etc.) Catalogs that don't
> support time travel could return an error and clients fall back to current
> behavior.
>
> I also like Prashant's solution to the portability concern by having
> catalogs materialize the metadata.json on-demand through an export API when
> needed for migration scenarios.
>
>
> On Tue, Feb 10, 2026 at 6:27 PM Gang Wu <[email protected]> wrote:
>
>> It seems that we are discussing on two orthogonal approaches:
>>
>> 1. Making the writing of the complete metadata.json file optional
>> during a commit, especially for catalogs that can manage metadata
>> themselves.
>> 2. Restructuring the metadata.json file (e.g., by offloading growing
>> parts like snapshot history to external files) to limit its size and
>> reduce write I/O, while still requiring the root file on every commit
>> for portability.
>>
>> I believe both approaches are worth exploring because in some cases
>> portability is still a top priority.
>>
>> Best,
>> Gang
>>
>> On Wed, Feb 11, 2026 at 9:27 AM Manu Zhang <[email protected]>
>> wrote:
>> >
>> > Can we add an abstraction to spec like root metadata (or snapshot
>> history manager) with the default implementation being metadata.json?
>> >
>> >
>> > On Wed, Feb 11, 2026 at 9:07 AM Prashant Singh <
>> [email protected]> wrote:
>> >>
>> >> +1 i think snapshot summary bloating was a major factor for bloating
>> metadata.json too specially for streaming writer based on my past exp, one
>> other way since we didn't wanted to propose the spec change was to have
>> strict requirement on how many snapshot we wanted to keep and let the
>> remove orphans do the clean up, also we removed the snapshot summaries
>> since they are optional anyways in addition to as in streaming mode we
>> create a large number of snapshot (not all were required anyways).
>> >> I believe there had been a lot of interesting discussion to optimize
>> read [1] as well as write [2] if we are open to make spec a bit relaxed, it
>> would be nice to move to the tracking of the metadata to the catalog and
>> then a protocol to retrieve it back without compromising the portability,
>> maybe we can just have a dedicate api which can help export this to a file
>> and in an intermediate stage we just operate on what we have stored in
>> catalog and we just materialize to the file when and if asked we are kind
>> of having similar discussion in IRC.
>> >>
>> >> All i think acknowledge it being a real problem for streaming writers
>> :) !
>> >>
>> >> Past discussions :
>> >> [1] https://lists.apache.org/thread/pwdd7qmdsfcrzjtsll53d3m9f74d03l8
>> >> [2] https://github.com/apache/iceberg/issues/2723
>> >>
>> >> Best,
>> >> Prashant Singh
>> >>
>> >> On Tue, Feb 10, 2026 at 4:45 PM Anton Okolnychyi <
>> [email protected]> wrote:
>> >>>
>> >>> I think Yufei is right and the snapshot history is the main
>> contributor. Streaming jobs that write every minute would generate over 10K
>> of snapshot entries per week. We had a similar problem with the list of
>> manifests that kept growing (until we added manifest lists) and with
>> references to previous metadata files (we only keep the last 100 now). So
>> we can definitely come up with something for snapshot entries. We will have
>> to ensure the entire set of snapshots is reachable from the latest root
>> file, even if it requires multiple IO operations.
>> >>>
>> >>> The main question is whether we still want to require writing root
>> JSON files during commits. If so, our commits will never be single file
>> commits. In V4, we will have to write the root manifest as well as the root
>> metadata file. I would prefer the second to be optional but we will need to
>> think about static tables and how to incorporate that in the spec.
>> >>>
>> >>>
>> >>>
>> >>> вт, 10 лют. 2026 р. о 15:58 Yufei Gu <[email protected]> пише:
>> >>>>
>> >>>> AFAIK, the snapshot history is the main, if not the only, reason for
>> the large metadata.json file. Moving the extra snapshot history to
>> additional file and keep it referenced in the root one may just resolve the
>> issue.
>> >>>>
>> >>>> Yufei
>> >>>>
>> >>>>
>> >>>> On Tue, Feb 10, 2026 at 3:27 PM huaxin gao <[email protected]>
>> wrote:
>> >>>>>
>> >>>>> +1, I think this is a real problem, especially for streaming /
>> frequent appends where commit latency matters and metadata.json keeps
>> getting bigger.
>> >>>>>
>> >>>>> I also agree we probably shouldn’t remove the root metadata file
>> completely. Having one file that describes the whole table is really useful
>> for portability and debugging.
>> >>>>>
>> >>>>> Of the options you listed, I like “offload pieces to external
>> files” as a first step. We still write the root file every commit, but it
>> won’t grow as fast. The downside is extra maintenance/GC complexity.
>> >>>>>
>> >>>>> A couple questions/ideas:
>> >>>>>
>> >>>>> Do we have any data on what parts of metadata.json grow the most
>> (snapshots / history / refs)? Even a rough breakdown could help decide what
>> to move out first.
>> >>>>> Could we do a hybrid: still write the root file every commit, but
>> only keep a “recent window” in it, and move older history to referenced
>> files? (portable, but bounded growth)
>> >>>>> For “optional on commit”, maybe make it a catalog capability (fast
>> commits if the catalog can serve metadata), but still support an
>> export/materialize step when portability is needed.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Huaxin
>> >>>>>
>> >>>>> On Tue, Feb 10, 2026 at 2:58 PM Anton Okolnychyi <
>> [email protected]> wrote:
>> >>>>>>
>> >>>>>> I don't think we have any consensus or concrete plan. In fact, I
>> don't know what my personal preference is at this point. The intention of
>> this thread is to gain that clarity. I don't think removing the root
>> metadata file entirely is a good idea. It is great to have a way to
>> describe the entire state of a table in a file. We just need to find a
>> solution for streaming appends that suffer from the increasing size of the
>> root metadata file.
>> >>>>>>
>> >>>>>> Like I said, making the generation of the json file on commit
>> optional is one way to solve this problem. We can also think about
>> offloading pieces of it to external files (say old snapshots). This would
>> mean we still have to write the root file on each commit but it will be
>> smaller. One clear downside is more complicated maintenance.
>> >>>>>>
>> >>>>>> Any other ideas/thoughts/feedback? Do people see this as a problem?
>> >>>>>>
>> >>>>>>
>> >>>>>> вт, 10 лют. 2026 р. о 14:18 Yufei Gu <[email protected]> пише:
>> >>>>>>>
>> >>>>>>> Hi Anton, thanks for raising this. I would really like to make
>> this optional and then build additional use cases on top of it. For
>> example, a catalog like IRC could completely eliminate storage IO during
>> commit and load, which is a big win. It could also provide better
>> protection for encrypted Iceberg tables, since metadata.json files are
>> plain text today.
>> >>>>>>>
>> >>>>>>> That said, do we have consensus that metadata.json can be
>> optional? There are real portability concerns, and engine-side work also
>> needs consideration. For example, static tables and the Spark driver still
>> expect to read this file directly from storage. It feels like the first
>> step here is aligning on whether metadata.json can be optional at all,
>> before we go deeper into how we get rid of. What do you think?
>> >>>>>>>
>> >>>>>>> Yufei
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Feb 10, 2026 at 11:23 AM Anton Okolnychyi <
>> [email protected]> wrote:
>> >>>>>>>>
>> >>>>>>>> While it may be common knowledge among Iceberg devs that writing
>> the root JSON file on commit is somewhat optional with a right catalog,
>> what can we do in V4 to solve this problem for all? My problem is the
>> suboptimal behavior that new users get by default with HMS or Hadoop
>> catalogs and how this impacts their perception of Iceberg. We are doing a
>> bunch of work for streaming (e.g. changelog scans, single file commits,
>> etc), but the need to write the root JSON file may cancel all of that.
>> >>>>>>>>
>> >>>>>>>> Let me throw some ideas out there.
>> >>>>>>>>
>> >>>>>>>> - Describe how catalogs can make the generation of the root
>> metadata file optional in the spec. Ideally, implement that in a built-in
>> catalog of choice as a reference implementation.
>> >>>>>>>> - Offload portions of the root metadata file to external files
>> and keep references to them.
>> >>>>>>>>
>> >>>>>>>> Thoughts?
>> >>>>>>>>
>> >>>>>>>> - Anton
>> >>>>>>>>
>> >>>>>>>>
>>
>

Re: [DISCUSS] metadata.json in v4?

Reply via email to