+1, I’m broadly aligned with this proposal. I think having a reference physical artifact to then compare against is valuable.
My team have been working on a few sets of tests that are of a similar nature. The motivation for us has been correctness of maintenance operations. I’ll share a bit of info here about these, in case its relevant and to understand how they may complement this proposal. First set tackle the question: is the compaction/replace operation resulting in the same logical data? We generate a table using Spark and then run some compaction operation (using a similar runner harness API like is proposed in this doc, i.e. “please compact table X”, could be PySpark, could be an API call to some maintenance service). Afterwards, we run a few common query engines (Spark, DuckDB, etc.) and verify that they agree with respect to an order-independent checksum, row count, NDVs; plus, that they return the same presence or absent for sampled rows to exercise the metadata path. Second, we have a table builder that, using the small model property (formal methods), is building an exhaustive set of table layouts which we can use as input to the above tests. For example, one test case is 3 physical rows across 2 data files; 2 rows are deleted across 2 positional delete files. Third, we have a validator we’ve written in Rust that just loads a metadata DAG (the JSON, manifest lists, manifests) and validates a bunch of invariants – i.e. data/file sequence numbers in manifest entries are optional ONLY for Added status, sequence numbers are non-negative; for the replace operation: table-uuid is immutable, schema fields don’t change, etc.. I think it’s a little imperfect since it relies on iceberg-rust which swaps some things under the hood very helpfully but means we aren’t testing the actual physical artifacts. Happy to chat more about some of these – the first two I’ve been working with a colleague on getting in a shape to publish on GitHub. Danny On 2026/06/29 18:40:43 Tanmay Rauth wrote: > Thanks Neelesh, the doc lays this out really well, and +1 to Matt. The > framing I'd most want to underline is one you already make: the hardest > cases aren't bugs, they're where two implementations both follow the spec > faithfully and still disagree. The day-transform field type (iceberg#16414) > is a good example, and I think your point that writing down the expected > value is what forces the ambiguity to get resolved is one of the strongest > motivations for the proposal. That's something per-implementation CI can > never do on its own. > > A couple of small things, for whatever they're worth: > > - The decoded-values-not-bytes approach feels right. The day-transform case > (iceberg#16414) is exactly where a byte-level comparison would flag two > valid encodings as different, while a value comparison correctly treats > them as equivalent. > - On the open question of reads-only vs. reads+writes: iceberg-go#880 > actually originated on the write side (Go wrote equality_ids as long, and > Java failed when reading it). It might be worth structuring each fixture > as input -> golden file -> expected value from the start. The same fixture > can then exercise both directions: read tests verify that the golden file > decodes to the expected value, while write tests verify that an > implementation produces a conforming golden file. That avoids having to > re-author fixtures when write conformance is added later. > > Thanks for putting this together. > > Regards, > Tanmay Rauth > > On Mon, Jun 29, 2026 at 9:57 AM Sung Yun > <[email protected]<mailto:[email protected]>> wrote: > > > +1, thanks Neelesh. Linking my parallel thread and doc for anyone who > > wants the detail [1]. > > > > Having read your write-up, I think the two are substantially the same > > proposal, with just narrow difference around proposed repo layout and the > > integration plan. I think it's a great sign that there's already a great > > amount of overlap in our thoughts. I agree that a community sync sounds > > worthwhile, and it would also be useful to converge the two docs in > > parallel so we bring one proposal back here for review and convergence > > through lazy consensus. > > > > A few areas from my version/poc [2] I think are worth folding in as points > > to discuss and converge on: > > > > - Contribution/README guides for adding and reviewing fixtures. > > - A submodule-based integration pattern, with each implementation pinning > > the fixture repo to a commit. > > - How each test surface is meant to be consumed and integrated by the > > individual implementations in their CI > > > > Sung > > > > [1] https://lists.apache.org/thread/964630c6q0jovs579x1jzb1t0o19jgjg > > [2] https://github.com/sungwy/iceberg-testing/pull/1 > > > > On 2026/06/29 16:47:18 Neelesh Salian wrote: > > > Thanks Matt. Seems like there is interest in doing this. > > > Separately, Sung has a similar proposal in the community and we are > > > connected offline to sync and converge since the proposals are along > > > similar lines. > > > Will update this thread as we discuss. > > > If there are more folks interested in this, it might be worth doing a > > > community on-off sync to brainstorm this as well. > > > > > > On Mon, Jun 29, 2026 at 8:30 AM Matt Topol > > > <[email protected]<mailto:[email protected]>> > > wrote: > > > > > > > Thanks for the proposal! I'm gonna read through this, but I just > > wanted to > > > > chime in that this is something I've been desiring and hoping for for a > > > > long time. We've encountered tons of cases during the development of > > > > iceberg-go where implementations diverged while still following the > > letter > > > > of the spec. This kind of testing is very much needed. > > > > > > > > --Matt > > > > > > > > On Mon, Jun 29, 2026, 11:11 AM Neelesh Salian < > > [email protected]<mailto:[email protected]>> > > > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> Each Iceberg implementation has its own tests, but there isn't a > > shared > > > >> way to check that > > > >> a table written by one is read the same way by another. > > > >> A few examples that have come up across the implementations: a > > manifest > > > >> written by one client that another can't read, > > > >> a metadata.json one writer produces that another rejects because they > > > >> disagree on whether a field is required, and a partition transform > > that > > > >> ends up encoded more than one way across implementations. Some of > > these > > > >> turned out to be bugs, others places where the spec is ambiguous. > > > >> > > > >> We think this is worth solving with some form of shared > > > >> cross-implementation conformance testing, and we'd like to align as a > > > >> community on whether to take it on and how best to start. We've > > written up > > > >> our current thinking, a possible direction, and a small prototype in > > the > > > >> doc below. > > > >> > > > >> Details, a repo design, and the interop failures we've collected: > > > >> > > https://docs.google.com/document/d/1HRcUMcrqUjo4CjGdwAIw85f7miWOGJ4ZJ90AgHbahaw/edit?usp=sharing > > > >> > > > >> > > > >> Feedback welcome on whether this is worth doing and how we might get > > > >> started. > > > >> > > > >> Thanks, > > > >> Neelesh (with Andrei Tserakhau) > > > >> > > > > > > > > > >
