Thanks Kevin, applied your suggestionm reads tighter this way. And right on cue, this bit someone again on the Go side last week https://github.com/apache/iceberg-go/pull/1176 - compacting a Spark-written days-partitioned table blew up with "cannot use time.Time with Avro type int". So the sooner this clarification lands, the sooner we stop re-litigating it every few months. :)
One process note: since it touches format/spec.md, the contributor guide treats it as a spec change that needs a formal vote even for a clarification (no lazy-consensus modifier). You've almost approved, and Russell too, so we're almost there, a couple more PMC +1s on the vote thread and we can merge https://github.com/apache/iceberg/pull/16446 and close the loop. I'll start a [VOTE] thread now to make it official. Best, Andrei On Thu, Jun 11, 2026 at 2:37 AM Kevin Liu <[email protected]> wrote: > We never closed the loop on this :) > > I have one suggestion to keep the explanation format agnostic, please take > a look! > https://github.com/apache/iceberg/pull/16446#pullrequestreview-4472647904 > I'm also happy to merge the PR as is. The most important part is to change > the result type from `int` -> `date` > > Best, > Kevin Liu > > On Fri, May 22, 2026 at 9:00 PM Gang Wu <[email protected]> wrote: > >> FWIW, iceberg-cpp also produces a date type for the day transform so >> we are happy with the consensus here. >> >> On Sat, May 23, 2026 at 12:14 AM Kevin Liu <[email protected]> wrote: >> > >> > Good to know about the Avro spec behavior, thanks Ryan. >> > >> > And thank you Andrei for driving the spec clarification. I'll comment >> on the PR. I don't think we need a vote since this is a clarification and >> not a change. >> > >> > On Thu, May 21, 2026 at 1:42 PM Andrei Tserakhau via dev < >> [email protected]> wrote: >> >> >> >> Thanks Kevin, Fokko, and Ryan, looks like we've converged. >> >> >> >> Summary of where this lands: >> >> >> >> - Result type for day becomes date, matching Java/PyIceberg/Rust's >> >> default behavior and the Avro types table in Appendix A. >> >> - Reader tolerance for historical plain-int manifests is inherited >> >> from the Avro spec itself (thanks Ryan for surfacing that saves >> >> us an Iceberg-side MUST clause). >> >> - A short note is added under the partition transforms table >> >> capturing the historical context, so this doesn't get re-litigated >> >> the next time someone reads the spec without the back-story. >> >> >> >> PR is updated accordingly: >> https://github.com/apache/iceberg/pull/16446 >> >> >> >> Fokko, Kevin, Ryan -- would appreciate a look when you have a moment. >> >> Happy to iterate further on the note wording if anything reads off. >> >> >> >> For iceberg-go, I'll follow up with the writer + reader alignment >> >> (PR #915 in iceberg-go is already in flight) once the spec change >> >> lands. >> >> >> >> Best, >> >> Andrei >> >> >> >> On Thu, May 21, 2026 at 9:41 PM Ryan Blue <[email protected]> wrote: >> >>> >> >>> Ugh, I think I sent from the wrong email address and my reply didn't >> go through. >> >>> >> >>> Other people have covered the same things here, except for one point: >> the Avro spec states that readers that don't support an annotation are >> required to ignore it. So the behavior to read either date or int correctly >> is inherited from the Avro spec. >> >>> >> >>> Ryan >> >>> >> >>> On Thu, May 21, 2026 at 10:17 AM Kevin Liu <[email protected]> >> wrote: >> >>>> >> >>>> I wasn’t aware of the previous back-and-forth changes to this line >> in the spec. Thanks for the extra context! >> >>>> >> >>>> A couple of points I want to align on: >> >>>> 1. All implementations except Go, including Java, Python, and Rust, >> write the day transform result as an Iceberg date type. That maps to the >> Avro date type and is serialized as { "type": "int", "logicalType": "date" >> }. >> >>>> 2. The Go implementation writes the day transform result an Iceberg >> int type. That maps to the Avro int type and is serialized as { "type": >> "int" }. >> >>>> 3. Java, Python, and Rust can read Avro manifest partition values as >> either an Avro int type or an Avro date type. >> >>>> 4. The Go implementation can currently read Avro manifest partition >> values only as an Avro int type. This is the original issue that sparked >> this conversation. >> >>>> >> >>>> Since the spec has gone back and forth between writing this as an >> Iceberg int and an Iceberg date, I think readers must accept both. We can >> include that as an implementation note. >> >>>> >> >>>> I support changing the spec back to date so it matches the default >> behavior for day partition values in our implementations. Go is also making >> the change to write date instead of int. >> >>>> The other approach, updating all implementations to match the >> current spec, would be a lot of work for little value. >> >>>> >> >>>> Hopefully this is the last time we make this change to the spec :) >> >>>> Would love to hear from others. >> >>>> >> >>>> Best, >> >>>> Kevin Liu >> >>>> >> >>>> On Wed, May 20, 2026 at 10:39 AM Fokko Driesprong <[email protected]> >> wrote: >> >>>>> >> >>>>> > It wouldn't be the first time we've retroactively updated the >> spec when finding inconsistencies with the current implementations :P >> >>>>> >> >>>>> I think generally we try to avoid this, but in this case it was >> changed to few times :P Maybe we should revert the spec change: >> >>>>> >> >>>>> >> https://github.com/apache/iceberg/pull/5980/changes#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588a >> >>>>> >> >>>>> Curious to hear what other think. >> >>>>> >> >>>>> Kind regards, >> >>>>> Fokko >> >>>>> >> >>>>> >> >>>>> On 2026/05/20 17:24:22 Matt Topol wrote: >> >>>>> > It wouldn't be the first time we've retroactively updated the spec >> >>>>> > when finding inconsistencies with the current implementations :P >> >>>>> > >> >>>>> > Particularly, in this case even the "reference implementation" >> (i.e. >> >>>>> > Java) is technically not spec-compliant since the spec says that >> it >> >>>>> > should be an "int", not an Avro "date" type. If all the >> >>>>> > implementations currently write a "date" type, then it's silly to >> have >> >>>>> > to say that every implementation is violating the spec. >> >>>>> > >> >>>>> > If we want the spec to say it should be an int, but tolerate >> reading >> >>>>> > an Avro "date" type, that's fine. But that would mean we should >> update >> >>>>> > Java, Rust, and PyIceberg to all write plain "int" and no longer >> write >> >>>>> > the "date" type, again: it would be silly to say that the >> reference >> >>>>> > implementation and 2 other implementations are not following the >> spec. >> >>>>> > :P >> >>>>> > >> >>>>> > I agree that it would be a big change for little value to update >> the >> >>>>> > implementations, so my opinion is that the spec should be updated >> to >> >>>>> > either say that "either" is allowed to be written, or that "date" >> >>>>> > should be written but "int" should be allowed to be read. >> >>>>> > >> >>>>> > --Matt >> >>>>> > >> >>>>> > On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong < >> [email protected]> wrote: >> >>>>> > > >> >>>>> > > Thanks for the quick PR Andrei. >> >>>>> > > >> >>>>> > > The problem is that the note conflicts with the Avro/Iceberg >> types table: https://iceberg.apache.org/spec/#avro >> >>>>> > > >> >>>>> > > I don't think we want to update the implementations as I agree >> that it would be a big change for little value. At the same time, I don't >> think we can retroactively update the spec. Maybe an implementation note >> would be a better solution to halt the tradition? >> >>>>> > > >> >>>>> > > Kind regards, >> >>>>> > > Fokko >> >>>>> > > >> >>>>> > > >> >>>>> > > On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote: >> >>>>> > > > Thanks Fokko, the historical context! >> >>>>> > > > >> >>>>> > > > Quick check that we're aligned, since I think we may be >> closer than >> >>>>> > > > it reads: >> >>>>> > > > >> >>>>> > > > My PR leaves the result type table as `int` -- no change to >> the >> >>>>> > > > transform table, no impact on hour/month/etc., no change to >> the >> >>>>> > > > type model. >> >>>>> > > > >> >>>>> > > > What the PR clarifies is the Avro encoding used when >> serializing a >> >>>>> > > > `day` partition field into a manifest. Empirically today, >> Java, >> >>>>> > > > PyIceberg, and Rust all write `{ "type": "int", >> "logicalType": "date" }` >> >>>>> > > > there (TypeToSchema in Java, DayTransform.result_type in >> PyIceberg, >> >>>>> > > > Transform::Day.result_type in Rust all produce a Date). Only >> >>>>> > > > iceberg-go produces plain Avro `int`. The PR codifies the de >> facto >> >>>>> > > > writer behavior as SHOULD and makes reader tolerance MUST. >> >>>>> > > > >> >>>>> > > > If your "stick with int" also covers the Avro annotation, >> then we'd >> >>>>> > > > effectively be reverting three writers and orphaning every >> existing >> >>>>> > > > manifest, which I don't think decent path, it's quite a big >> change >> >>>>> > > > for small benefits. >> >>>>> > > > >> >>>>> > > > Either way, super happy to adjust the spec adjustment, the >> goal is to >> >>>>> > > > stop this tradition of re-litigating issue every year, by >> misreading >> >>>>> > > > this part of the spec. >> >>>>> > > > >> >>>>> > > > Best, >> >>>>> > > > Andrei >> >>>>> > > > >> >>>>> > > > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong < >> [email protected]> wrote: >> >>>>> > > > >> >>>>> > > > > Thanks for briging this up Kevin, a gift that keeps on >> giving :) >> >>>>> > > > > >> https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427 >> >>>>> > > > > >> >>>>> > > > > 1. I think we should stick with the int type as defined in >> the spec. >> >>>>> > > > > 2. It feels to me that some readers are more permissive >> here than others. >> >>>>> > > > > I believe some allow reading date as an int without >> throwing. Practically, >> >>>>> > > > > readers should read both. >> >>>>> > > > > 3. Unfortunally, I think this is water under the bridge. As >> shown above in >> >>>>> > > > > the GitHub Issue, we went back and forth, so I don't see a >> lot of value in >> >>>>> > > > > switching this to date. All OSS implementations handle this >> as an int >> >>>>> > > > > internally, and this also aligns with hour/month/etc. >> >>>>> > > > > >> >>>>> > > > > Hope this historical context helps. >> >>>>> > > > > >> >>>>> > > > > Kind regards, >> >>>>> > > > > Fokko >> >>>>> > > > > >> >>>>> > > > > >> >>>>> > > > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote: >> >>>>> > > > > > Here is a fast follow with a PR: >> >>>>> > > > > > https://github.com/apache/iceberg/pull/16446 >> >>>>> > > > > > >> >>>>> > > > > > Best, >> >>>>> > > > > > Andrei >> >>>>> > > > > > >> >>>>> > > > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau < >> >>>>> > > > > > [email protected]> wrote: >> >>>>> > > > > > >> >>>>> > > > > > > Thanks for raising this, Kevin. >> >>>>> > > > > > > >> >>>>> > > > > > > Speaking as an iceberg-go maintainer, even though Go is >> the >> >>>>> > > > > > > implementation that has to move, I'd vote: >> >>>>> > > > > > > >> >>>>> > > > > > > 1. Writers SHOULD emit { "type": "int", "logicalType": >> "date" }. >> >>>>> > > > > > > 2. Readers MUST accept both plain `int` and `int` >> annotated with >> >>>>> > > > > > > `logicalType: date`. >> >>>>> > > > > > > 3. Keep the transform result type table as-is (`int` as >> the logical >> >>>>> > > > > > > Iceberg type). Don't change it to `date`. Add a >> separate, normative >> >>>>> > > > > > > manifest-encoding clause so projection and >> expression-evaluation >> >>>>> > > > > > > semantics that depend on the type model stay >> untouched. >> >>>>> > > > > > > >> >>>>> > > > > > > Reasoning: when Java, PyIceberg, and Rust all write >> logical `date`, >> >>>>> > > > > > > that's the de facto wire format. Forcing them to switch >> to plain `int` >> >>>>> > > > > > > to match a literal reading of the transform table would >> churn three >> >>>>> > > > > > > implementations and leave every existing manifest >> "non-conforming" >> >>>>> > > > > > > forever. Aligning Go with the dominant writer >> convention costs one >> >>>>> > > > > > > implementation change (PR #915 already proposes it) and >> zero historical >> >>>>> > > > > > > churn. >> >>>>> > > > > > > >> >>>>> > > > > > > The underlying ambiguity is that "result type" (logical >> Iceberg type) >> >>>>> > > > > > > and "Avro manifest encoding" (wire format) were >> conflated. Separating >> >>>>> > > > > > > them in spec text removes the ambiguity without >> changing the type >> >>>>> > > > > > > system. >> >>>>> > > > > > > >> >>>>> > > > > > > Happy to drive the spec PR and then iceberg-go writer + >> reader >> >>>>> > > > > > > alignment. >> >>>>> > > > > > > >> >>>>> > > > > > > Best, >> >>>>> > > > > > > Andrei >> >>>>> > > > > > > >> >>>>> > > > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu < >> [email protected]> >> >>>>> > > > > wrote: >> >>>>> > > > > > > >> >>>>> > > > > > >> Hi all, >> >>>>> > > > > > >> >> >>>>> > > > > > >> I'd like to invite the community to discuss a spec >> ambiguity in Apache >> >>>>> > > > > > >> Iceberg that has caused some confusion across >> implementations. We've >> >>>>> > > > > seen >> >>>>> > > > > > >> this come up in Python, Rust, and now Go. >> >>>>> > > > > > >> >> >>>>> > > > > > >> The issue: the spec documents the `day` partition >> transform's result >> >>>>> > > > > type >> >>>>> > > > > > >> as plain `int`, but Java, PyIceberg, and Rust all >> write manifest >> >>>>> > > > > partition >> >>>>> > > > > > >> fields using Avro's logical `date` type. Go currently >> writes plain >> >>>>> > > > > `int`, >> >>>>> > > > > > >> which is the strict reading of the spec. Since both >> forms have the >> >>>>> > > > > same >> >>>>> > > > > > >> physical representation, the difference is only the >> Avro schema >> >>>>> > > > > annotation >> >>>>> > > > > > >> -- but it's worth clarifying the spec so all >> implementations are >> >>>>> > > > > aligned. >> >>>>> > > > > > >> >> >>>>> > > > > > >> The full analysis, including a breakdown of each >> implementation's >> >>>>> > > > > > >> writer/reader behavior and proposed resolution >> options, is here: >> >>>>> > > > > > >> https://github.com/apache/iceberg/issues/16414 >> >>>>> > > > > > >> >> >>>>> > > > > > >> At a high level, the questions for the community are: >> >>>>> > > > > > >> 1. What should implementations write: Avro `int` >> (plain integer) or >> >>>>> > > > > Avro >> >>>>> > > > > > >> `date` (integer with a date logical type)? >> >>>>> > > > > > >> 2. Should implementations be required to read both >> forms, or just >> >>>>> > > > > > >> encouraged to? >> >>>>> > > > > > >> 3. Should the spec's transform result type table be >> updated from >> >>>>> > > > > `int` to >> >>>>> > > > > > >> `date`? >> >>>>> > > > > > >> >> >>>>> > > > > > >> I'd love to hear your thoughts. Thanks! >> >>>>> > > > > > >> >> >>>>> > > > > > >> Best, >> >>>>> > > > > > >> Kevin Liu >> >>>>> > > > > > >> >> >>>>> > > > > > > >> >>>>> > > > > > >> >>>>> > > > > >> >>>>> > > > >> >>>>> > >> >
