Hi folks,

To keep everyone in the loop: we've (Sung, Andrei, Amogh and myself)
connected offline to discuss this.
We are working through the proposal to share one with the group and resolve
any early questions.

+1 to the idea of a community sync given the level of interest.
Will share more when things are ready on our end.

Thanks.


On Mon, Jun 29, 2026 at 11:41 AM Tanmay Rauth <[email protected]> wrote:

> Thanks Neelesh, the doc lays this out really well, and +1 to Matt. The
> framing I'd most want to underline is one you already make: the hardest
> cases aren't bugs, they're where two implementations both  follow the spec
> faithfully and still disagree. The day-transform field type (iceberg#16414)
> is a good example, and I think your point that writing down the expected
> value is what forces the ambiguity to get  resolved is one of the strongest
> motivations for the proposal. That's something per-implementation CI can
> never do on its own.
>
> A couple of small things, for whatever they're worth:
>
> - The decoded-values-not-bytes approach feels right. The day-transform
> case (iceberg#16414) is exactly where a byte-level comparison would flag
> two valid encodings as different, while a value comparison correctly treats
> them as equivalent.
> - On the open question of reads-only vs. reads+writes: iceberg-go#880
> actually originated on the write side (Go wrote equality_ids as long, and
> Java failed when reading it). It might be worth structuring each  fixture
> as input -> golden file -> expected value from the start. The same fixture
> can then exercise both directions: read tests verify that the golden file
> decodes to the expected value, while write tests  verify that an
> implementation produces a conforming golden file. That avoids having to
> re-author fixtures when write conformance is added later.
>
> Thanks for putting this together.
>
> Regards,
> Tanmay Rauth
>
> On Mon, Jun 29, 2026 at 9:57 AM Sung Yun <[email protected]> wrote:
>
>> +1, thanks Neelesh. Linking my parallel thread and doc for anyone who
>> wants the detail [1].
>>
>> Having read your write-up, I think the two are substantially the same
>> proposal, with just narrow difference around proposed repo layout and the
>> integration plan. I think it's a great sign that there's already a great
>> amount of overlap in our thoughts. I agree that a community sync sounds
>> worthwhile, and it would also be useful to converge the two docs in
>> parallel so we bring one proposal back here for review and convergence
>> through lazy consensus.
>>
>> A few areas from my version/poc [2] I think are worth folding in as
>> points to discuss and converge on:
>>
>> - Contribution/README guides for adding and reviewing fixtures.
>> - A submodule-based integration pattern, with each implementation pinning
>> the fixture repo to a commit.
>> - How each test surface is meant to be consumed and integrated by the
>> individual implementations in their CI
>>
>> Sung
>>
>> [1] https://lists.apache.org/thread/964630c6q0jovs579x1jzb1t0o19jgjg
>> [2] https://github.com/sungwy/iceberg-testing/pull/1
>>
>> On 2026/06/29 16:47:18 Neelesh Salian wrote:
>> > Thanks Matt. Seems like there is interest in doing this.
>> > Separately, Sung has a similar proposal in the community and we are
>> > connected offline to sync and converge since the proposals are along
>> > similar lines.
>> > Will update this thread as we discuss.
>> > If there are more folks interested in this, it might be worth doing a
>> > community on-off sync to brainstorm this as well.
>> >
>> > On Mon, Jun 29, 2026 at 8:30 AM Matt Topol <[email protected]>
>> wrote:
>> >
>> > > Thanks for the proposal! I'm gonna read through this, but I just
>> wanted to
>> > > chime in that this is something I've been desiring and hoping for for
>> a
>> > > long time. We've encountered tons of cases during the development of
>> > > iceberg-go where implementations diverged while still following the
>> letter
>> > > of the spec. This kind of testing is very much needed.
>> > >
>> > > --Matt
>> > >
>> > > On Mon, Jun 29, 2026, 11:11 AM Neelesh Salian <
>> [email protected]>
>> > > wrote:
>> > >
>> > >> Hi all,
>> > >>
>> > >> Each Iceberg implementation has its own tests, but there isn't a
>> shared
>> > >> way to check that
>> > >> a table written by one is read the same way by another.
>> > >> A few examples that have come up across the implementations: a
>> manifest
>> > >> written by one client that another can't read,
>> > >> a metadata.json one writer produces that another rejects because they
>> > >> disagree on whether a field is required, and a partition transform
>> that
>> > >> ends up encoded more than one way across implementations. Some of
>> these
>> > >> turned out to be bugs, others places where the spec is ambiguous.
>> > >>
>> > >> We think this is worth solving with some form of shared
>> > >> cross-implementation conformance testing, and we'd like to align as a
>> > >> community on whether to take it on and how best to start. We've
>> written up
>> > >> our current thinking, a possible direction, and a small prototype in
>> the
>> > >> doc below.
>> > >>
>> > >> Details, a repo design, and the interop failures we've collected:
>> > >>
>> https://docs.google.com/document/d/1HRcUMcrqUjo4CjGdwAIw85f7miWOGJ4ZJ90AgHbahaw/edit?usp=sharing
>> > >>
>> > >>
>> > >> Feedback welcome on whether this is worth doing and how we might get
>> > >> started.
>> > >>
>> > >> Thanks,
>> > >> Neelesh (with Andrei Tserakhau)
>> > >>
>> > >
>> >
>>
>

Reply via email to