Thanks Tanmay, and +1 to your framing. The cases that aren't bugs -- where two implementations both follow the spec and still disagree -- are the ones per-implementation CI can't catch by construction, and writing the expected value down is what forces the ambiguity to get resolved. That's the part I care most about too.
Your two points actually connect, and it's worth being precise about it: - "decoded values, not bytes" is right for the day transform (iceberg#16414): an int vs a date logical type are two encodings of the same logical value, and a value comparison correctly treats them as equal. - equality_ids (iceberg-go#880) is the opposite case. int vs long is a real defect, but a flat decoded-value comparison would normalize it away: decode long(1) and int(1) both to 1 and the test passes while the bug ships. So the rule needs to be spec-semantic rather than just "decoded": normalize representation where the spec leaves it open (day transform), but retain the physical type where the spec pins one (equality_ids is int, not long). And your input -> golden file -> expected value structure is exactly how we catch that. #880 originated on the write side, so a read-only suite over known-good goldens never exercises the writer that produced it. Structuring every fixture bidirectionally from the start -- read test = golden decodes to the expected value, write test = an implementation produces a conforming golden, including the pinned encoding -- means we don't re-author fixtures when write conformance lands. We'll build them that way from v1 even if the first cases we turn on are reads. On the rest: agree on converging with Sung's doc. The submodule pinning, contribution/review guides, and per-impl CI integration are the integration details we're working through offline now. +1 on the community sync, happy to help organize. Andrei On Tue, Jun 30, 2026 at 10:07 PM Neelesh Salian <[email protected]> wrote: > Hi folks, > > To keep everyone in the loop: we've (Sung, Andrei, Amogh and myself) > connected offline to discuss this. > We are working through the proposal to share one with the group and > resolve any early questions. > > +1 to the idea of a community sync given the level of interest. > Will share more when things are ready on our end. > > Thanks. > > > On Mon, Jun 29, 2026 at 11:41 AM Tanmay Rauth <[email protected]> > wrote: > >> Thanks Neelesh, the doc lays this out really well, and +1 to Matt. The >> framing I'd most want to underline is one you already make: the hardest >> cases aren't bugs, they're where two implementations both follow the spec >> faithfully and still disagree. The day-transform field type (iceberg#16414) >> is a good example, and I think your point that writing down the expected >> value is what forces the ambiguity to get resolved is one of the strongest >> motivations for the proposal. That's something per-implementation CI can >> never do on its own. >> >> A couple of small things, for whatever they're worth: >> >> - The decoded-values-not-bytes approach feels right. The day-transform >> case (iceberg#16414) is exactly where a byte-level comparison would flag >> two valid encodings as different, while a value comparison correctly treats >> them as equivalent. >> - On the open question of reads-only vs. reads+writes: iceberg-go#880 >> actually originated on the write side (Go wrote equality_ids as long, and >> Java failed when reading it). It might be worth structuring each fixture >> as input -> golden file -> expected value from the start. The same fixture >> can then exercise both directions: read tests verify that the golden file >> decodes to the expected value, while write tests verify that an >> implementation produces a conforming golden file. That avoids having to >> re-author fixtures when write conformance is added later. >> >> Thanks for putting this together. >> >> Regards, >> Tanmay Rauth >> >> On Mon, Jun 29, 2026 at 9:57 AM Sung Yun <[email protected]> wrote: >> >>> +1, thanks Neelesh. Linking my parallel thread and doc for anyone who >>> wants the detail [1]. >>> >>> Having read your write-up, I think the two are substantially the same >>> proposal, with just narrow difference around proposed repo layout and the >>> integration plan. I think it's a great sign that there's already a great >>> amount of overlap in our thoughts. I agree that a community sync sounds >>> worthwhile, and it would also be useful to converge the two docs in >>> parallel so we bring one proposal back here for review and convergence >>> through lazy consensus. >>> >>> A few areas from my version/poc [2] I think are worth folding in as >>> points to discuss and converge on: >>> >>> - Contribution/README guides for adding and reviewing fixtures. >>> - A submodule-based integration pattern, with each implementation >>> pinning the fixture repo to a commit. >>> - How each test surface is meant to be consumed and integrated by the >>> individual implementations in their CI >>> >>> Sung >>> >>> [1] https://lists.apache.org/thread/964630c6q0jovs579x1jzb1t0o19jgjg >>> [2] https://github.com/sungwy/iceberg-testing/pull/1 >>> >>> On 2026/06/29 16:47:18 Neelesh Salian wrote: >>> > Thanks Matt. Seems like there is interest in doing this. >>> > Separately, Sung has a similar proposal in the community and we are >>> > connected offline to sync and converge since the proposals are along >>> > similar lines. >>> > Will update this thread as we discuss. >>> > If there are more folks interested in this, it might be worth doing a >>> > community on-off sync to brainstorm this as well. >>> > >>> > On Mon, Jun 29, 2026 at 8:30 AM Matt Topol <[email protected]> >>> wrote: >>> > >>> > > Thanks for the proposal! I'm gonna read through this, but I just >>> wanted to >>> > > chime in that this is something I've been desiring and hoping for >>> for a >>> > > long time. We've encountered tons of cases during the development of >>> > > iceberg-go where implementations diverged while still following the >>> letter >>> > > of the spec. This kind of testing is very much needed. >>> > > >>> > > --Matt >>> > > >>> > > On Mon, Jun 29, 2026, 11:11 AM Neelesh Salian < >>> [email protected]> >>> > > wrote: >>> > > >>> > >> Hi all, >>> > >> >>> > >> Each Iceberg implementation has its own tests, but there isn't a >>> shared >>> > >> way to check that >>> > >> a table written by one is read the same way by another. >>> > >> A few examples that have come up across the implementations: a >>> manifest >>> > >> written by one client that another can't read, >>> > >> a metadata.json one writer produces that another rejects because >>> they >>> > >> disagree on whether a field is required, and a partition transform >>> that >>> > >> ends up encoded more than one way across implementations. Some of >>> these >>> > >> turned out to be bugs, others places where the spec is ambiguous. >>> > >> >>> > >> We think this is worth solving with some form of shared >>> > >> cross-implementation conformance testing, and we'd like to align as >>> a >>> > >> community on whether to take it on and how best to start. We've >>> written up >>> > >> our current thinking, a possible direction, and a small prototype >>> in the >>> > >> doc below. >>> > >> >>> > >> Details, a repo design, and the interop failures we've collected: >>> > >> >>> https://docs.google.com/document/d/1HRcUMcrqUjo4CjGdwAIw85f7miWOGJ4ZJ90AgHbahaw/edit?usp=sharing >>> > >> >>> > >> >>> > >> Feedback welcome on whether this is worth doing and how we might get >>> > >> started. >>> > >> >>> > >> Thanks, >>> > >> Neelesh (with Andrei Tserakhau) >>> > >> >>> > > >>> > >>> >>
