I agree the parquet-testing repo should have example Parquet files storing variants.
It was brought to my attention recently that the duckdb folks made some testing files[1] based on the Iceberg test suite. Perhaps we can add those files to parquet-testing as part of [2]. I expect we'll get to testing the Rust shredding implementation in 2-3 weeks at which time I will likely help try and push this forward. It would be great if someone else wanted to help do it beforehand. Andrew [1]: https://github.com/duckdb/duckdb/pull/18224 [2]: https://github.com/apache/parquet-testing/issues/75 On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <ust...@gmail.com> wrote: > I was under the impression that parquet-testing does not yet have Parquet > files with variant type annotations. > > Is this still the case? If not, should we add some (shredded and > unshredded) files produced by Java and Go implementations? > > On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <aihu...@gmail.com> wrote: > > > Thanks Matt for the comment and working on the GO variant. > > > > Micah, that’s a good point. Let me check out the coverage completeness > for > > these two implementations. > > > > > > > > > On Jul 22, 2025, at 10:01 AM, Matt Topol <zotthewiz...@gmail.com> > wrote: > > > > > > Assuming that the files with variants in > > > https://github.com/apache/parquet-testing are generated by > parquet-java, > > > then we at least have confirmed that the Go implementation is able to > > read > > > variant files that are written by the Java implementation. So there's > at > > > least some testing of the two implementations against each other. > > > > > > --Matt > > > > > >> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield < > emkornfi...@gmail.com > > > > > >> wrote: > > >> > > >> Have we tested the two implementations against one another? > > >> > > >>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu <aihu...@gmail.com> wrote: > > >>> > > >>> Hi community, > > >>> > > >>> Per the Parquet specification requirements, two reference > > implementations > > >>> are needed to finalize the Variant logical type. Both Java and Go > > >>> implementations now support variant encoding and shredding. > > >>> > > >>> Java already has the encoding and shredding implementations in place: > > >>> apache/parquet-java#3197 < > > >> https://github.com/apache/parquet-java/pull/3197 > > >>>> > > >>> apache/parquet-java#3202 < > > >> https://github.com/apache/parquet-java/pull/3202 > > >>>> > > >>> apache/parquet-java#3223 > > >>> <https://github.com/apache/parquet-java/issues/3223> > > >>> apache/parquet-java#3211 > > >>> <https://github.com/apache/parquet-java/issues/3211> > > >>> > > >>> Go also includes encoding and shredding support: > > >>> apache/arrow-go#344 <https://github.com/apache/arrow-go/pull/344> > > >>> apache/arrow-go#434 <https://github.com/apache/arrow-go/pull/434> > > >>> > > >>> I propose that we remove the "under development" notes from the > > >>> documentation and move forward with finalizing the specification (PR > > #509 > > >>> <https://github.com/apache/parquet-format/pull/509>). > > >>> This vote will be open for at least 72 hours. > > >>> > > >>> [ ] +1 Finalize Varint and Shredding Spec > > >>> [ ] +0 > > >>> [ ] -1 Do not release this because... > > >>> > > >> > > >