Hi folks, To keep everyone in the loop: we've (Sung, Andrei, Amogh and myself) connected offline to discuss this. We are working through the proposal to share one with the group and resolve any early questions.
+1 to the idea of a community sync given the level of interest. Will share more when things are ready on our end. Thanks. On Mon, Jun 29, 2026 at 11:41 AM Tanmay Rauth <[email protected]> wrote: > Thanks Neelesh, the doc lays this out really well, and +1 to Matt. The > framing I'd most want to underline is one you already make: the hardest > cases aren't bugs, they're where two implementations both follow the spec > faithfully and still disagree. The day-transform field type (iceberg#16414) > is a good example, and I think your point that writing down the expected > value is what forces the ambiguity to get resolved is one of the strongest > motivations for the proposal. That's something per-implementation CI can > never do on its own. > > A couple of small things, for whatever they're worth: > > - The decoded-values-not-bytes approach feels right. The day-transform > case (iceberg#16414) is exactly where a byte-level comparison would flag > two valid encodings as different, while a value comparison correctly treats > them as equivalent. > - On the open question of reads-only vs. reads+writes: iceberg-go#880 > actually originated on the write side (Go wrote equality_ids as long, and > Java failed when reading it). It might be worth structuring each fixture > as input -> golden file -> expected value from the start. The same fixture > can then exercise both directions: read tests verify that the golden file > decodes to the expected value, while write tests verify that an > implementation produces a conforming golden file. That avoids having to > re-author fixtures when write conformance is added later. > > Thanks for putting this together. > > Regards, > Tanmay Rauth > > On Mon, Jun 29, 2026 at 9:57 AM Sung Yun <[email protected]> wrote: > >> +1, thanks Neelesh. Linking my parallel thread and doc for anyone who >> wants the detail [1]. >> >> Having read your write-up, I think the two are substantially the same >> proposal, with just narrow difference around proposed repo layout and the >> integration plan. I think it's a great sign that there's already a great >> amount of overlap in our thoughts. I agree that a community sync sounds >> worthwhile, and it would also be useful to converge the two docs in >> parallel so we bring one proposal back here for review and convergence >> through lazy consensus. >> >> A few areas from my version/poc [2] I think are worth folding in as >> points to discuss and converge on: >> >> - Contribution/README guides for adding and reviewing fixtures. >> - A submodule-based integration pattern, with each implementation pinning >> the fixture repo to a commit. >> - How each test surface is meant to be consumed and integrated by the >> individual implementations in their CI >> >> Sung >> >> [1] https://lists.apache.org/thread/964630c6q0jovs579x1jzb1t0o19jgjg >> [2] https://github.com/sungwy/iceberg-testing/pull/1 >> >> On 2026/06/29 16:47:18 Neelesh Salian wrote: >> > Thanks Matt. Seems like there is interest in doing this. >> > Separately, Sung has a similar proposal in the community and we are >> > connected offline to sync and converge since the proposals are along >> > similar lines. >> > Will update this thread as we discuss. >> > If there are more folks interested in this, it might be worth doing a >> > community on-off sync to brainstorm this as well. >> > >> > On Mon, Jun 29, 2026 at 8:30 AM Matt Topol <[email protected]> >> wrote: >> > >> > > Thanks for the proposal! I'm gonna read through this, but I just >> wanted to >> > > chime in that this is something I've been desiring and hoping for for >> a >> > > long time. We've encountered tons of cases during the development of >> > > iceberg-go where implementations diverged while still following the >> letter >> > > of the spec. This kind of testing is very much needed. >> > > >> > > --Matt >> > > >> > > On Mon, Jun 29, 2026, 11:11 AM Neelesh Salian < >> [email protected]> >> > > wrote: >> > > >> > >> Hi all, >> > >> >> > >> Each Iceberg implementation has its own tests, but there isn't a >> shared >> > >> way to check that >> > >> a table written by one is read the same way by another. >> > >> A few examples that have come up across the implementations: a >> manifest >> > >> written by one client that another can't read, >> > >> a metadata.json one writer produces that another rejects because they >> > >> disagree on whether a field is required, and a partition transform >> that >> > >> ends up encoded more than one way across implementations. Some of >> these >> > >> turned out to be bugs, others places where the spec is ambiguous. >> > >> >> > >> We think this is worth solving with some form of shared >> > >> cross-implementation conformance testing, and we'd like to align as a >> > >> community on whether to take it on and how best to start. We've >> written up >> > >> our current thinking, a possible direction, and a small prototype in >> the >> > >> doc below. >> > >> >> > >> Details, a repo design, and the interop failures we've collected: >> > >> >> https://docs.google.com/document/d/1HRcUMcrqUjo4CjGdwAIw85f7miWOGJ4ZJ90AgHbahaw/edit?usp=sharing >> > >> >> > >> >> > >> Feedback welcome on whether this is worth doing and how we might get >> > >> started. >> > >> >> > >> Thanks, >> > >> Neelesh (with Andrei Tserakhau) >> > >> >> > > >> > >> >
