For very wide tables, I think this becomes a problem with single digit numbers of schema changes. My theoretical thought here is we have a table with 1000 columns that we add new columns to every hour or so. Unless I want to keep my history locked to 24hours (or less) schema bloat is gonna be a pretty big issue
On Thu, Feb 12, 2026 at 10:37 AM Ryan Blue <[email protected]> wrote: > For tables where this is a problem, how are you currently managing older > schemas? Older schemas do not need to be kept if there aren't any snapshots > that reference them. > > On Thu, Feb 12, 2026 at 10:24 AM Russell Spitzer < > [email protected]> wrote: > >> My gut instinct on this is that it's a great idea. I think we probably >> need to think a bit more about how to decide on "base" schema promotion but >> theoretically this seems like it should be a huge benefit for wide tables. >> >> On Thu, Feb 12, 2026 at 7:55 AM Talat Uyarer via dev < >> [email protected]> wrote: >> >>> Hi All, >>> >>> I am sharing a new proposal for Iceberg Spec v4: *Delta-Encoded Schemas* >>> . We propose moving away from monolithic schema storage to address a >>> growing scalability bottleneck in high-velocity and ultra-wide table >>> environments. >>> >>> The current Iceberg Spec re-serializes and appends the entire schema >>> object to metadata.json for every schema operation, which leads to >>> massive schema data replication. For a large table with 5,000 columns+ >>> with frequent schema updates, this can result in metadata files exceeding >>> GBs, causing significant query planning latencies and OOM driver side. >>> >>> *Proposal Summary:* >>> >>> We propose implementing *Delta-Encoded Schema Evolution for Spec v4* using >>> a *"Merge-on-Read" (MoR) approach for metadata*. This approach involves >>> transitioning the schemas field from "Full Snapshots" to a sequence of *Base >>> Schemas* (type full) and *Schema Deltas* (type delta) that store >>> differential mutations relative to a base ID. >>> >>> *Key Goals:* >>> >>> - Achieve a *99.4% reduction in the size of schema-related metadata* >>> . >>> - Drastically lower the storage and IO requirements for metadata.json >>> . >>> - Accelerate query planning by reducing the JSON payload size. >>> - Preserve self-containment by keeping the schema in the metadata >>> file, avoiding external sidecar files. >>> >>> The full proposal, including the flat resolution model (no delta >>> chaining), the defined set of atomic delta operations (add, update, >>> delete), and the lifecycle/compaction mechanics, is available for >>> review: >>> >>> https://s.apache.org/iceberg-delta-schemas >>> <https://www.google.com/url?source=gmail&sa=E&q=https://s.apache.org/iceberg-delta-schemas> >>> >>> I look forward to your feedback and discussion on the dev list. >>> >>> Thanks >>> Talat >>> >>
