Thanks Aihua, I'm away for the rest of the week but I'll retest on Tuesday when I return.
Given the discussion we're having on the PR it seems that we should propose some clarifying updates to the spec wording to clear up some ambiguities. Can you and anyone else take a look at that this week? --Matt On Wed, Jul 30, 2025, 5:07 PM Aihua Xu <aihu...@gmail.com> wrote: > Hi Matt and community, > > I have created the Parquet file test cases > <https://github.com/apache/parquet-testing/pull/91/files> with the > variant logical type on top of what Ryan created. I have verified the > variant implementation in Parquet-Java (see > https://github.com/apache/parquet-java/pull/3258/files). Matt, can you > test out against the new data set in GO? > > Thanks, > Aihua > > On Mon, Jul 28, 2025 at 2:05 PM Ryan Blue <rdb...@gmail.com> wrote: > >> Yes, as Aihua mentioned we know that the annotation is missing. Iceberg >> uses released versions of Parquet so we won't be able to produce files >> with >> the annotation until there's a parquet-java release. This is one reason >> why >> I've been pushing to get the annotation released regardless of the status >> of the spec. >> >> On Mon, Jul 28, 2025 at 1:29 PM Aihua Xu <aihu...@gmail.com> wrote: >> >> > Hi Matt, >> > >> > Yeah. That's expected. I mentioned in the description of >> > https://github.com/apache/parquet-java/pull/3258. Because we haven't >> > released Parquet-Java including the variant logical type (we can't have >> a >> > release until we have done the validation and finalized the spec), >> Iceberg >> > cannot upgrade and write the Parquet file with that. >> > >> > I have a workaround in the test using something below to get them >> recognize >> > them as a variant type. >> > >> > (if (type.getName().equals("var") || type.getLogicalTypeAnnotation() >> > instanceof LogicalTypeAnnotation.VariantLogicalTypeAnnotation) >> > >> > I'm wondering if you can do the same in the GO test? >> > >> > Thanks, >> > Aihua >> > >> > On Mon, Jul 28, 2025 at 11:25 AM Matt Topol <zotthewiz...@gmail.com> >> > wrote: >> > >> > > @Aihua Xu <aihu...@gmail.com> @RyanBlue it appears that the parquet >> > files >> > > in the indicated PR ( >> > > https://github.com/apache/parquet-testing/pull/90/files) do not seem >> to >> > > currently have the `Variant` logical type specified in their schema. >> Is >> > > this intentional? The Go implementation won't (and shouldn't) >> recognize >> > the >> > > files as containing variant values without the logical type being >> used, >> > > instead they are simply read as structs. Should this be updated? >> > > >> > > Should I write something to have Go generate the files locally to test >> > > against? >> > > >> > > --Matt >> > > >> > > On Sun, Jul 27, 2025 at 5:45 PM Aihua Xu <aihu...@gmail.com> wrote: >> > > >> > >> Thanks Matt. >> > >> >> > >> >> > >> > On Jul 27, 2025, at 2:31 PM, Matt Topol <zotthewiz...@gmail.com> >> > wrote: >> > >> > >> > >> > I'll work this week on getting the Go implementation to use the >> same >> > >> > testing files and ensure compatibility. >> > >> > >> > >> >> On Sun, Jul 27, 2025, 5:28 PM Aihua Xu <aihu...@gmail.com> wrote: >> > >> >> >> > >> >> Hi all, >> > >> >> >> > >> >> Following up on the test effort to validate the compatibility of >> the >> > >> >> Variant implementation: >> > >> >> >> > >> >> Ryan has contributed test cases >> > >> >> <https://github.com/apache/parquet-testing/pull/90/files> from >> > Iceberg >> > >> >> (see PR >> > >> >> #13654 <https://github.com/apache/iceberg/pull/13654>), which I >> used >> > >> to >> > >> >> verify <https://github.com/apache/parquet-java/pull/3258/> the >> > Variant >> > >> >> implementation in Parquet-Java. The validation surfaced a few >> minor >> > >> issues, >> > >> >> but overall the results confirm compatibility between the two >> > >> >> implementations. >> > >> >> >> > >> >> Let me know if you have any questions or additional follow-up >> > requests. >> > >> >> >> > >> >> Thanks, >> > >> >> >> > >> >> Aihua >> > >> >> >> > >> >> >> > >> >> >> > >> >> On Wed, Jul 23, 2025 at 2:24 AM Andrew Lamb < >> andrewlam...@gmail.com> >> > >> >> wrote: >> > >> >> >> > >> >>> I agree the parquet-testing repo should have example Parquet >> files >> > >> >> storing >> > >> >>> variants. >> > >> >>> >> > >> >>> It was brought to my attention recently that the duckdb folks >> made >> > >> some >> > >> >>> testing files[1] based on the Iceberg test suite. >> > >> >>> >> > >> >>> Perhaps we can add those files to parquet-testing as part of [2]. >> > >> >>> >> > >> >>> I expect we'll get to testing the Rust shredding implementation >> in >> > 2-3 >> > >> >>> weeks at which time I will likely help try and push this >> forward. It >> > >> >> would >> > >> >>> be great if someone else wanted to help do it beforehand. >> > >> >>> >> > >> >>> Andrew >> > >> >>> >> > >> >>> [1]: https://github.com/duckdb/duckdb/pull/18224 >> > >> >>> [2]: https://github.com/apache/parquet-testing/issues/75 >> > >> >>> >> > >> >>>> On Wed, Jul 23, 2025 at 1:14 AM Gang Wu <ust...@gmail.com> >> wrote: >> > >> >>> >> > >> >>>> I was under the impression that parquet-testing does not yet >> have >> > >> >> Parquet >> > >> >>>> files with variant type annotations. >> > >> >>>> >> > >> >>>> Is this still the case? If not, should we add some (shredded and >> > >> >>>> unshredded) files produced by Java and Go implementations? >> > >> >>>> >> > >> >>>> On Wed, Jul 23, 2025 at 3:18 AM Aihua Xu <aihu...@gmail.com> >> > wrote: >> > >> >>>> >> > >> >>>>> Thanks Matt for the comment and working on the GO variant. >> > >> >>>>> >> > >> >>>>> Micah, that’s a good point. Let me check out the coverage >> > >> >> completeness >> > >> >>>> for >> > >> >>>>> these two implementations. >> > >> >>>>> >> > >> >>>>> >> > >> >>>>> >> > >> >>>>>> On Jul 22, 2025, at 10:01 AM, Matt Topol < >> zotthewiz...@gmail.com >> > > >> > >> >>>> wrote: >> > >> >>>>>> >> > >> >>>>>> Assuming that the files with variants in >> > >> >>>>>> https://github.com/apache/parquet-testing are generated by >> > >> >>>> parquet-java, >> > >> >>>>>> then we at least have confirmed that the Go implementation is >> > able >> > >> >> to >> > >> >>>>> read >> > >> >>>>>> variant files that are written by the Java implementation. So >> > >> >> there's >> > >> >>>> at >> > >> >>>>>> least some testing of the two implementations against each >> other. >> > >> >>>>>> >> > >> >>>>>> --Matt >> > >> >>>>>> >> > >> >>>>>>> On Tue, Jul 22, 2025 at 12:29 AM Micah Kornfield < >> > >> >>>> emkornfi...@gmail.com >> > >> >>>>>> >> > >> >>>>>>> wrote: >> > >> >>>>>>> >> > >> >>>>>>> Have we tested the two implementations against one another? >> > >> >>>>>>> >> > >> >>>>>>>> On Mon, Jul 21, 2025 at 9:14 PM Aihua Xu <aihu...@gmail.com >> > >> > >> >>> wrote: >> > >> >>>>>>>> >> > >> >>>>>>>> Hi community, >> > >> >>>>>>>> >> > >> >>>>>>>> Per the Parquet specification requirements, two reference >> > >> >>>>> implementations >> > >> >>>>>>>> are needed to finalize the Variant logical type. Both Java >> and >> > Go >> > >> >>>>>>>> implementations now support variant encoding and shredding. >> > >> >>>>>>>> >> > >> >>>>>>>> Java already has the encoding and shredding implementations >> in >> > >> >>> place: >> > >> >>>>>>>> apache/parquet-java#3197 < >> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3197 >> > >> >>>>>>>>> >> > >> >>>>>>>> apache/parquet-java#3202 < >> > >> >>>>>>> https://github.com/apache/parquet-java/pull/3202 >> > >> >>>>>>>>> >> > >> >>>>>>>> apache/parquet-java#3223 >> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3223> >> > >> >>>>>>>> apache/parquet-java#3211 >> > >> >>>>>>>> <https://github.com/apache/parquet-java/issues/3211> >> > >> >>>>>>>> >> > >> >>>>>>>> Go also includes encoding and shredding support: >> > >> >>>>>>>> apache/arrow-go#344 < >> > https://github.com/apache/arrow-go/pull/344 >> > >> >>> >> > >> >>>>>>>> apache/arrow-go#434 < >> > https://github.com/apache/arrow-go/pull/434 >> > >> >>> >> > >> >>>>>>>> >> > >> >>>>>>>> I propose that we remove the "under development" notes from >> the >> > >> >>>>>>>> documentation and move forward with finalizing the >> > specification >> > >> >>> (PR >> > >> >>>>> #509 >> > >> >>>>>>>> <https://github.com/apache/parquet-format/pull/509>). >> > >> >>>>>>>> This vote will be open for at least 72 hours. >> > >> >>>>>>>> >> > >> >>>>>>>> [ ] +1 Finalize Varint and Shredding Spec >> > >> >>>>>>>> [ ] +0 >> > >> >>>>>>>> [ ] -1 Do not release this because... >> > >> >>>>>>>> >> > >> >>>>>>> >> > >> >>>>> >> > >> >>>> >> > >> >>> >> > >> >> >> > >> >> > > >> > >> >