I wasn’t aware of the previous back-and-forth changes to this line in the
spec. Thanks for the extra context!

A couple of points I want to align on:
1. All implementations except Go, including Java, Python, and Rust, write
the day transform result as an Iceberg date type. That maps to the Avro
date type and is serialized as { "type": "int", "logicalType": "date" }.
2. The Go implementation writes the day transform result an Iceberg int
type. That maps to the Avro int type and is serialized as { "type": "int" }.
3. Java, Python, and Rust can read Avro manifest partition values as either
an Avro int type or an Avro date type.
4. The Go implementation can currently read Avro manifest partition values
only as an Avro int type. This is the original issue that sparked this
conversation.

Since the spec has gone back and forth between writing this as an Iceberg
int and an Iceberg date, I think readers must accept both. We can include
that as an implementation note.

I support changing the spec back to date so it matches the default behavior
for day partition values in our implementations. Go is also making the
change to write date instead of int.
The other approach, updating all implementations to match the current spec,
would be a lot of work for little value.

Hopefully this is the last time we make this change to the spec :)
Would love to hear from others.

Best,
Kevin Liu

On Wed, May 20, 2026 at 10:39 AM Fokko Driesprong <[email protected]> wrote:

> > It wouldn't be the first time we've retroactively updated the spec when
> finding inconsistencies with the current implementations :P
>
> I think generally we try to avoid this, but in this case it was changed to
> few times :P Maybe we should revert the spec change:
>
>
> https://github.com/apache/iceberg/pull/5980/changes#diff-36347a47c3bf67ea2ef6309ea96201814032d21bb5f162dfae4045508c15588a
>
> Curious to hear what other think.
>
> Kind regards,
> Fokko
>
>
> On 2026/05/20 17:24:22 Matt Topol wrote:
> > It wouldn't be the first time we've retroactively updated the spec
> > when finding inconsistencies with the current implementations :P
> >
> > Particularly, in this case even the "reference implementation" (i.e.
> > Java) is technically not spec-compliant since the spec says that it
> > should be an "int", not an Avro "date" type. If all the
> > implementations currently write a "date" type, then it's silly to have
> > to say that every implementation is violating the spec.
> >
> > If we want the spec to say it should be an int, but tolerate reading
> > an Avro "date" type, that's fine. But that would mean we should update
> > Java, Rust, and PyIceberg to all write plain "int" and no longer write
> > the "date" type, again: it would be silly to say that the reference
> > implementation and 2 other implementations are not following the spec.
> > :P
> >
> > I agree that it would be a big change for little value to update the
> > implementations, so my opinion is that the spec should be updated to
> > either say that "either" is allowed to be written, or that "date"
> > should be written but "int" should be allowed to be read.
> >
> > --Matt
> >
> > On Wed, May 20, 2026 at 1:05 PM Fokko Driesprong <[email protected]>
> wrote:
> > >
> > > Thanks for the quick PR Andrei.
> > >
> > > The problem is that the note conflicts with the Avro/Iceberg types
> table: https://iceberg.apache.org/spec/#avro
> > >
> > > I don't think we want to update the implementations as I agree that it
> would be a big change for little value. At the same time, I don't think we
> can retroactively update the spec. Maybe an implementation note would be a
> better solution to halt the tradition?
> > >
> > > Kind regards,
> > > Fokko
> > >
> > >
> > > On 2026/05/20 16:49:29 Andrei Tserakhau via dev wrote:
> > > > Thanks Fokko, the historical context!
> > > >
> > > > Quick check that we're aligned, since I think we may be closer than
> > > > it reads:
> > > >
> > > > My PR leaves the result type table as `int` -- no change to the
> > > > transform table, no impact on hour/month/etc., no change to the
> > > > type model.
> > > >
> > > > What the PR clarifies is the Avro encoding used when serializing a
> > > > `day` partition field into a manifest. Empirically today, Java,
> > > > PyIceberg, and Rust all write `{ "type": "int", "logicalType":
> "date" }`
> > > > there (TypeToSchema in Java, DayTransform.result_type in PyIceberg,
> > > > Transform::Day.result_type in Rust all produce a Date). Only
> > > > iceberg-go produces plain Avro `int`. The PR codifies the de facto
> > > > writer behavior as SHOULD and makes reader tolerance MUST.
> > > >
> > > > If your "stick with int" also covers the Avro annotation, then we'd
> > > > effectively be reverting three writers and orphaning every existing
> > > > manifest, which I don't think decent path, it's quite a big change
> > > > for small benefits.
> > > >
> > > > Either way, super happy to adjust the spec adjustment, the goal is to
> > > > stop this tradition of re-litigating issue every year, by misreading
> > > > this part of the spec.
> > > >
> > > > Best,
> > > > Andrei
> > > >
> > > > On Wed, May 20, 2026 at 6:37 PM Fokko Driesprong <[email protected]>
> wrote:
> > > >
> > > > > Thanks for briging this up Kevin, a gift that keeps on giving :)
> > > > >
> https://github.com/apache/iceberg/issues/10616#issuecomment-2200191427
> > > > >
> > > > > 1. I think we should stick with the int type as defined in the
> spec.
> > > > > 2. It feels to me that some readers are more permissive here than
> others.
> > > > > I believe some allow reading date as an int without throwing.
> Practically,
> > > > > readers should read both.
> > > > > 3. Unfortunally, I think this is water under the bridge. As shown
> above in
> > > > > the GitHub Issue, we went back and forth, so I don't see a lot of
> value in
> > > > > switching this to date. All OSS implementations handle this as an
> int
> > > > > internally, and this also aligns with hour/month/etc.
> > > > >
> > > > > Hope this historical context helps.
> > > > >
> > > > > Kind regards,
> > > > > Fokko
> > > > >
> > > > >
> > > > > On 2026/05/20 16:33:51 Andrei Tserakhau via dev wrote:
> > > > > > Here is a fast follow with a PR:
> > > > > > https://github.com/apache/iceberg/pull/16446
> > > > > >
> > > > > > Best,
> > > > > > Andrei
> > > > > >
> > > > > > On Wed, May 20, 2026 at 6:11 PM Andrei Tserakhau <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Thanks for raising this, Kevin.
> > > > > > >
> > > > > > > Speaking as an iceberg-go maintainer, even though Go is the
> > > > > > > implementation that has to move, I'd vote:
> > > > > > >
> > > > > > > 1. Writers SHOULD emit { "type": "int", "logicalType": "date"
> }.
> > > > > > > 2. Readers MUST accept both plain `int` and `int` annotated
> with
> > > > > > >    `logicalType: date`.
> > > > > > > 3. Keep the transform result type table as-is (`int` as the
> logical
> > > > > > >    Iceberg type). Don't change it to `date`. Add a separate,
> normative
> > > > > > >    manifest-encoding clause so projection and
> expression-evaluation
> > > > > > >    semantics that depend on the type model stay untouched.
> > > > > > >
> > > > > > > Reasoning: when Java, PyIceberg, and Rust all write logical
> `date`,
> > > > > > > that's the de facto wire format. Forcing them to switch to
> plain `int`
> > > > > > > to match a literal reading of the transform table would churn
> three
> > > > > > > implementations and leave every existing manifest
> "non-conforming"
> > > > > > > forever. Aligning Go with the dominant writer convention costs
> one
> > > > > > > implementation change (PR #915 already proposes it) and zero
> historical
> > > > > > > churn.
> > > > > > >
> > > > > > > The underlying ambiguity is that "result type" (logical
> Iceberg type)
> > > > > > > and "Avro manifest encoding" (wire format) were conflated.
> Separating
> > > > > > > them in spec text removes the ambiguity without changing the
> type
> > > > > > > system.
> > > > > > >
> > > > > > > Happy to drive the spec PR and then iceberg-go writer + reader
> > > > > > > alignment.
> > > > > > >
> > > > > > > Best,
> > > > > > > Andrei
> > > > > > >
> > > > > > > On Tue, May 19, 2026 at 5:45 PM Kevin Liu <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> I'd like to invite the community to discuss a spec ambiguity
> in Apache
> > > > > > >> Iceberg that has caused some confusion across
> implementations. We've
> > > > > seen
> > > > > > >> this come up in Python, Rust, and now Go.
> > > > > > >>
> > > > > > >> The issue: the spec documents the `day` partition transform's
> result
> > > > > type
> > > > > > >> as plain `int`, but Java, PyIceberg, and Rust all write
> manifest
> > > > > partition
> > > > > > >> fields using Avro's logical `date` type. Go currently writes
> plain
> > > > > `int`,
> > > > > > >> which is the strict reading of the spec. Since both forms
> have the
> > > > > same
> > > > > > >> physical representation, the difference is only the Avro
> schema
> > > > > annotation
> > > > > > >> -- but it's worth clarifying the spec so all implementations
> are
> > > > > aligned.
> > > > > > >>
> > > > > > >> The full analysis, including a breakdown of each
> implementation's
> > > > > > >> writer/reader behavior and proposed resolution options, is
> here:
> > > > > > >> https://github.com/apache/iceberg/issues/16414
> > > > > > >>
> > > > > > >> At a high level, the questions for the community are:
> > > > > > >> 1. What should implementations write: Avro `int` (plain
> integer) or
> > > > > Avro
> > > > > > >> `date` (integer with a date logical type)?
> > > > > > >> 2. Should implementations be required to read both forms, or
> just
> > > > > > >> encouraged to?
> > > > > > >> 3. Should the spec's transform result type table be updated
> from
> > > > > `int` to
> > > > > > >> `date`?
> > > > > > >>
> > > > > > >> I'd love to hear your thoughts. Thanks!
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Kevin Liu
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>

Reply via email to