> It wouldn't be the first time we've retroactively updated the spec when > finding inconsistencies with the current implementations :P
I think generally we try to avoid this, but in this case it was changed to few times :P Maybe we should revert the spec change: https://github.com/apache/iceberg/pull/5980/changes#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588a Curious to hear what other think. Kind regards, Fokko On 2026/05/20 17:24:22 Matt Topol wrote: > It wouldn't be the first time we've retroactively updated the spec > when finding inconsistencies with the current implementations :P > > Particularly, in this case even the "reference implementation" (i.e. > Java) is technically not spec-compliant since the spec says that it > should be an "int", not an Avro "date" type. If all the > implementations currently write a "date" type, then it's silly to have > to say that every implementation is violating the spec. > > If we want the spec to say it should be an int, but tolerate reading > an Avro "date" type, that's fine. But that would mean we should update > Java, Rust, and PyIceberg to all write plain "int" and no longer write > the "date" type, again: it would be silly to say that the reference > implementation and 2 other implementations are not following the spec. > :P > > I agree that it would be a big change for little value to update the > implementations, so my opinion is that the spec should be updated to > either say that "either" is allowed to be written, or that "date" > should be written but "int" should be allowed to be read. > > --Matt > > On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong <[email protected]> wrote: > > > > Thanks for the quick PR Andrei. > > > > The problem is that the note conflicts with the Avro/Iceberg types table: > > https://iceberg.apache.org/spec/#avro > > > > I don't think we want to update the implementations as I agree that it > > would be a big change for little value. At the same time, I don't think we > > can retroactively update the spec. Maybe an implementation note would be a > > better solution to halt the tradition? > > > > Kind regards, > > Fokko > > > > > > On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote: > > > Thanks Fokko, the historical context! > > > > > > Quick check that we're aligned, since I think we may be closer than > > > it reads: > > > > > > My PR leaves the result type table as `int` -- no change to the > > > transform table, no impact on hour/month/etc., no change to the > > > type model. > > > > > > What the PR clarifies is the Avro encoding used when serializing a > > > `day` partition field into a manifest. Empirically today, Java, > > > PyIceberg, and Rust all write `{ "type": "int", "logicalType": "date" }` > > > there (TypeToSchema in Java, DayTransform.result_type in PyIceberg, > > > Transform::Day.result_type in Rust all produce a Date). Only > > > iceberg-go produces plain Avro `int`. The PR codifies the de facto > > > writer behavior as SHOULD and makes reader tolerance MUST. > > > > > > If your "stick with int" also covers the Avro annotation, then we'd > > > effectively be reverting three writers and orphaning every existing > > > manifest, which I don't think decent path, it's quite a big change > > > for small benefits. > > > > > > Either way, super happy to adjust the spec adjustment, the goal is to > > > stop this tradition of re-litigating issue every year, by misreading > > > this part of the spec. > > > > > > Best, > > > Andrei > > > > > > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong <[email protected]> wrote: > > > > > > > Thanks for briging this up Kevin, a gift that keeps on giving :) > > > > https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427 > > > > > > > > 1. I think we should stick with the int type as defined in the spec. > > > > 2. It feels to me that some readers are more permissive here than > > > > others. > > > > I believe some allow reading date as an int without throwing. > > > > Practically, > > > > readers should read both. > > > > 3. Unfortunally, I think this is water under the bridge. As shown above > > > > in > > > > the GitHub Issue, we went back and forth, so I don't see a lot of value > > > > in > > > > switching this to date. All OSS implementations handle this as an int > > > > internally, and this also aligns with hour/month/etc. > > > > > > > > Hope this historical context helps. > > > > > > > > Kind regards, > > > > Fokko > > > > > > > > > > > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote: > > > > > Here is a fast follow with a PR: > > > > > https://github.com/apache/iceberg/pull/16446 > > > > > > > > > > Best, > > > > > Andrei > > > > > > > > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau < > > > > > [email protected]> wrote: > > > > > > > > > > > Thanks for raising this, Kevin. > > > > > > > > > > > > Speaking as an iceberg-go maintainer, even though Go is the > > > > > > implementation that has to move, I'd vote: > > > > > > > > > > > > 1. Writers SHOULD emit { "type": "int", "logicalType": "date" }. > > > > > > 2. Readers MUST accept both plain `int` and `int` annotated with > > > > > > `logicalType: date`. > > > > > > 3. Keep the transform result type table as-is (`int` as the logical > > > > > > Iceberg type). Don't change it to `date`. Add a separate, > > > > > > normative > > > > > > manifest-encoding clause so projection and expression-evaluation > > > > > > semantics that depend on the type model stay untouched. > > > > > > > > > > > > Reasoning: when Java, PyIceberg, and Rust all write logical `date`, > > > > > > that's the de facto wire format. Forcing them to switch to plain > > > > > > `int` > > > > > > to match a literal reading of the transform table would churn three > > > > > > implementations and leave every existing manifest "non-conforming" > > > > > > forever. Aligning Go with the dominant writer convention costs one > > > > > > implementation change (PR #915 already proposes it) and zero > > > > > > historical > > > > > > churn. > > > > > > > > > > > > The underlying ambiguity is that "result type" (logical Iceberg > > > > > > type) > > > > > > and "Avro manifest encoding" (wire format) were conflated. > > > > > > Separating > > > > > > them in spec text removes the ambiguity without changing the type > > > > > > system. > > > > > > > > > > > > Happy to drive the spec PR and then iceberg-go writer + reader > > > > > > alignment. > > > > > > > > > > > > Best, > > > > > > Andrei > > > > > > > > > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu <[email protected]> > > > > wrote: > > > > > > > > > > > >> Hi all, > > > > > >> > > > > > >> I'd like to invite the community to discuss a spec ambiguity in > > > > > >> Apache > > > > > >> Iceberg that has caused some confusion across implementations. > > > > > >> We've > > > > seen > > > > > >> this come up in Python, Rust, and now Go. > > > > > >> > > > > > >> The issue: the spec documents the `day` partition transform's > > > > > >> result > > > > type > > > > > >> as plain `int`, but Java, PyIceberg, and Rust all write manifest > > > > partition > > > > > >> fields using Avro's logical `date` type. Go currently writes plain > > > > `int`, > > > > > >> which is the strict reading of the spec. Since both forms have the > > > > same > > > > > >> physical representation, the difference is only the Avro schema > > > > annotation > > > > > >> -- but it's worth clarifying the spec so all implementations are > > > > aligned. > > > > > >> > > > > > >> The full analysis, including a breakdown of each implementation's > > > > > >> writer/reader behavior and proposed resolution options, is here: > > > > > >> https://github.com/apache/iceberg/issues/16414 > > > > > >> > > > > > >> At a high level, the questions for the community are: > > > > > >> 1. What should implementations write: Avro `int` (plain integer) or > > > > Avro > > > > > >> `date` (integer with a date logical type)? > > > > > >> 2. Should implementations be required to read both forms, or just > > > > > >> encouraged to? > > > > > >> 3. Should the spec's transform result type table be updated from > > > > `int` to > > > > > >> `date`? > > > > > >> > > > > > >> I'd love to hear your thoughts. Thanks! > > > > > >> > > > > > >> Best, > > > > > >> Kevin Liu > > > > > >> > > > > > > > > > > > > > > > > > > >
