I'll work this week on getting the Go implementation to use the same testing files and ensure compatibility.
On Sun, Jul 27, 2025, 5:28 PM Aihua Xu <aihu...@gmail.com> wrote: > Hi all, > > Following up on the test effort to validate the compatibility of the > Variant implementation: > > Ryan has contributed test cases > <https://github.com/apache/parquet-testing/pull/90/files> from Iceberg > (see PR > #13654 <https://github.com/apache/iceberg/pull/13654>), which I used to > verify <https://github.com/apache/parquet-java/pull/3258/> the Variant > implementation in Parquet-Java. The validation surfaced a few minor issues, > but overall the results confirm compatibility between the two > implementations. > > Let me know if you have any questions or additional follow-up requests. > > Thanks, > > Aihua > > > > On Wed, Jul 23, 2025 at 2:24 AM Andrew Lamb <andrewlam...@gmail.com> > wrote: > > > I agree the parquet-testing repo should have example Parquet files > storing > > variants. > > > > It was brought to my attention recently that the duckdb folks made some > > testing files[1] based on the Iceberg test suite. > > > > Perhaps we can add those files to parquet-testing as part of [2]. > > > > I expect we'll get to testing the Rust shredding implementation in 2-3 > > weeks at which time I will likely help try and push this forward. It > would > > be great if someone else wanted to help do it beforehand. > > > > Andrew > > > > [1]: https://github.com/duckdb/duckdb/pull/18224 > > [2]: https://github.com/apache/parquet-testing/issues/75 > > > > On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <ust...@gmail.com> wrote: > > > > > I was under the impression that parquet-testing does not yet have > Parquet > > > files with variant type annotations. > > > > > > Is this still the case? If not, should we add some (shredded and > > > unshredded) files produced by Java and Go implementations? > > > > > > On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <aihu...@gmail.com> wrote: > > > > > > > Thanks Matt for the comment and working on the GO variant. > > > > > > > > Micah, that’s a good point. Let me check out the coverage > completeness > > > for > > > > these two implementations. > > > > > > > > > > > > > > > > > On Jul 22, 2025, at 10:01 AM, Matt Topol <zotthewiz...@gmail.com> > > > wrote: > > > > > > > > > > Assuming that the files with variants in > > > > > https://github.com/apache/parquet-testing are generated by > > > parquet-java, > > > > > then we at least have confirmed that the Go implementation is able > to > > > > read > > > > > variant files that are written by the Java implementation. So > there's > > > at > > > > > least some testing of the two implementations against each other. > > > > > > > > > > --Matt > > > > > > > > > >> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield < > > > emkornfi...@gmail.com > > > > > > > > > >> wrote: > > > > >> > > > > >> Have we tested the two implementations against one another? > > > > >> > > > > >>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu <aihu...@gmail.com> > > wrote: > > > > >>> > > > > >>> Hi community, > > > > >>> > > > > >>> Per the Parquet specification requirements, two reference > > > > implementations > > > > >>> are needed to finalize the Variant logical type. Both Java and Go > > > > >>> implementations now support variant encoding and shredding. > > > > >>> > > > > >>> Java already has the encoding and shredding implementations in > > place: > > > > >>> apache/parquet-java#3197 < > > > > >> https://github.com/apache/parquet-java/pull/3197 > > > > >>>> > > > > >>> apache/parquet-java#3202 < > > > > >> https://github.com/apache/parquet-java/pull/3202 > > > > >>>> > > > > >>> apache/parquet-java#3223 > > > > >>> <https://github.com/apache/parquet-java/issues/3223> > > > > >>> apache/parquet-java#3211 > > > > >>> <https://github.com/apache/parquet-java/issues/3211> > > > > >>> > > > > >>> Go also includes encoding and shredding support: > > > > >>> apache/arrow-go#344 <https://github.com/apache/arrow-go/pull/344 > > > > > > >>> apache/arrow-go#434 <https://github.com/apache/arrow-go/pull/434 > > > > > > >>> > > > > >>> I propose that we remove the "under development" notes from the > > > > >>> documentation and move forward with finalizing the specification > > (PR > > > > #509 > > > > >>> <https://github.com/apache/parquet-format/pull/509>). > > > > >>> This vote will be open for at least 72 hours. > > > > >>> > > > > >>> [ ] +1 Finalize Varint and Shredding Spec > > > > >>> [ ] +0 > > > > >>> [ ] -1 Do not release this because... > > > > >>> > > > > >> > > > > > > > > > >