Re: [DISCUSS] Cross-implementation conformance testing for Iceberg

Andrei Tserakhau via dev Tue, 30 Jun 2026 14:27:31 -0700

Thanks Tanmay, and +1 to your framing. The cases that aren't bugs -- where
two implementations both follow the spec and still disagree -- are the ones
per-implementation CI can't catch by construction, and writing the expected
value down is what forces the ambiguity to get resolved. That's the part I
care most about too.


Your two points actually connect, and it's worth being precise about it:

- "decoded values, not bytes" is right for the day transform
(iceberg#16414):
  an int vs a date logical type are two encodings of the same logical value,
  and a value comparison correctly treats them as equal.

- equality_ids (iceberg-go#880) is the opposite case. int vs long is a real
  defect, but a flat decoded-value comparison would normalize it away:
decode
  long(1) and int(1) both to 1 and the test passes while the bug ships. So
the
  rule needs to be spec-semantic rather than just "decoded": normalize
  representation where the spec leaves it open (day transform), but retain
the
  physical type where the spec pins one (equality_ids is int, not long).

And your input -> golden file -> expected value structure is exactly how we
catch that. #880 originated on the write side, so a read-only suite over
known-good goldens never exercises the writer that produced it. Structuring
every fixture bidirectionally from the start -- read test = golden decodes
to
the expected value, write test = an implementation produces a conforming
golden, including the pinned encoding -- means we don't re-author fixtures
when write conformance lands. We'll build them that way from v1 even if the
first cases we turn on are reads.

On the rest: agree on converging with Sung's doc. The submodule pinning,
contribution/review guides, and per-impl CI integration are the integration
details we're working through offline now. +1 on the community sync, happy
to
help organize.

Andrei

On Tue, Jun 30, 2026 at 10:07 PM Neelesh Salian <[email protected]>
wrote:

> Hi folks,
>
> To keep everyone in the loop: we've (Sung, Andrei, Amogh and myself)
> connected offline to discuss this.
> We are working through the proposal to share one with the group and
> resolve any early questions.
>
> +1 to the idea of a community sync given the level of interest.
> Will share more when things are ready on our end.
>
> Thanks.
>
>
> On Mon, Jun 29, 2026 at 11:41 AM Tanmay Rauth <[email protected]>
> wrote:
>
>> Thanks Neelesh, the doc lays this out really well, and +1 to Matt. The
>> framing I'd most want to underline is one you already make: the hardest
>> cases aren't bugs, they're where two implementations both  follow the spec
>> faithfully and still disagree. The day-transform field type (iceberg#16414)
>> is a good example, and I think your point that writing down the expected
>> value is what forces the ambiguity to get  resolved is one of the strongest
>> motivations for the proposal. That's something per-implementation CI can
>> never do on its own.
>>
>> A couple of small things, for whatever they're worth:
>>
>> - The decoded-values-not-bytes approach feels right. The day-transform
>> case (iceberg#16414) is exactly where a byte-level comparison would flag
>> two valid encodings as different, while a value comparison correctly treats
>> them as equivalent.
>> - On the open question of reads-only vs. reads+writes: iceberg-go#880
>> actually originated on the write side (Go wrote equality_ids as long, and
>> Java failed when reading it). It might be worth structuring each  fixture
>> as input -> golden file -> expected value from the start. The same fixture
>> can then exercise both directions: read tests verify that the golden file
>> decodes to the expected value, while write tests  verify that an
>> implementation produces a conforming golden file. That avoids having to
>> re-author fixtures when write conformance is added later.
>>
>> Thanks for putting this together.
>>
>> Regards,
>> Tanmay Rauth
>>
>> On Mon, Jun 29, 2026 at 9:57 AM Sung Yun <[email protected]> wrote:
>>
>>> +1, thanks Neelesh. Linking my parallel thread and doc for anyone who
>>> wants the detail [1].
>>>
>>> Having read your write-up, I think the two are substantially the same
>>> proposal, with just narrow difference around proposed repo layout and the
>>> integration plan. I think it's a great sign that there's already a great
>>> amount of overlap in our thoughts. I agree that a community sync sounds
>>> worthwhile, and it would also be useful to converge the two docs in
>>> parallel so we bring one proposal back here for review and convergence
>>> through lazy consensus.
>>>
>>> A few areas from my version/poc [2] I think are worth folding in as
>>> points to discuss and converge on:
>>>
>>> - Contribution/README guides for adding and reviewing fixtures.
>>> - A submodule-based integration pattern, with each implementation
>>> pinning the fixture repo to a commit.
>>> - How each test surface is meant to be consumed and integrated by the
>>> individual implementations in their CI
>>>
>>> Sung
>>>
>>> [1] https://lists.apache.org/thread/964630c6q0jovs579x1jzb1t0o19jgjg
>>> [2] https://github.com/sungwy/iceberg-testing/pull/1
>>>
>>> On 2026/06/29 16:47:18 Neelesh Salian wrote:
>>> > Thanks Matt. Seems like there is interest in doing this.
>>> > Separately, Sung has a similar proposal in the community and we are
>>> > connected offline to sync and converge since the proposals are along
>>> > similar lines.
>>> > Will update this thread as we discuss.
>>> > If there are more folks interested in this, it might be worth doing a
>>> > community on-off sync to brainstorm this as well.
>>> >
>>> > On Mon, Jun 29, 2026 at 8:30 AM Matt Topol <[email protected]>
>>> wrote:
>>> >
>>> > > Thanks for the proposal! I'm gonna read through this, but I just
>>> wanted to
>>> > > chime in that this is something I've been desiring and hoping for
>>> for a
>>> > > long time. We've encountered tons of cases during the development of
>>> > > iceberg-go where implementations diverged while still following the
>>> letter
>>> > > of the spec. This kind of testing is very much needed.
>>> > >
>>> > > --Matt
>>> > >
>>> > > On Mon, Jun 29, 2026, 11:11 AM Neelesh Salian <
>>> [email protected]>
>>> > > wrote:
>>> > >
>>> > >> Hi all,
>>> > >>
>>> > >> Each Iceberg implementation has its own tests, but there isn't a
>>> shared
>>> > >> way to check that
>>> > >> a table written by one is read the same way by another.
>>> > >> A few examples that have come up across the implementations: a
>>> manifest
>>> > >> written by one client that another can't read,
>>> > >> a metadata.json one writer produces that another rejects because
>>> they
>>> > >> disagree on whether a field is required, and a partition transform
>>> that
>>> > >> ends up encoded more than one way across implementations. Some of
>>> these
>>> > >> turned out to be bugs, others places where the spec is ambiguous.
>>> > >>
>>> > >> We think this is worth solving with some form of shared
>>> > >> cross-implementation conformance testing, and we'd like to align as
>>> a
>>> > >> community on whether to take it on and how best to start. We've
>>> written up
>>> > >> our current thinking, a possible direction, and a small prototype
>>> in the
>>> > >> doc below.
>>> > >>
>>> > >> Details, a repo design, and the interop failures we've collected:
>>> > >>
>>> https://docs.google.com/document/d/1HRcUMcrqUjo4CjGdwAIw85f7miWOGJ4ZJ90AgHbahaw/edit?usp=sharing
>>> > >>
>>> > >>
>>> > >> Feedback welcome on whether this is worth doing and how we might get
>>> > >> started.
>>> > >>
>>> > >> Thanks,
>>> > >> Neelesh (with Andrei Tserakhau)
>>> > >>
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] Cross-implementation conformance testing for Iceberg

Reply via email to