Hi community, Let me know if a vote process is needed or we can review in https://github.com/apache/parquet-format/pull/509 (which is to remove the under development lines).
Thanks, Aihua On Mon, Aug 18, 2025 at 10:53 AM Aihua Xu <aihu...@gmail.com> wrote: > Hi Micah and community, > > We’ve generated the test files from Go (PR #94 > <https://github.com/apache/parquet-testing/pull/94>) and successfully > validated them in Parquet-Java (PR #3258 > <https://github.com/apache/parquet-java/pull/3258>). During testing, we > identified two minor issues in the Go generation: > > 1. > > The spec version should be *1* instead of *0*. > 2. > > The Parquet TIME type should be TIME(isAdjustedToUTC=false, MICROS) > instead of TIME(isAdjustedToUTC=true, MICROS). > > These issues have already been addressed by Matt. > > Looking ahead, here’s what I propose for closing out the Variant release: > > 1. > > Start a vote to finalize the Variant spec (removing the two lines > under *active development*). > 2. > > Start a vote for the Parquet-Java 1.16.0 release. > > Please share your thoughts on these next steps, or let me know if you see > anything else we should address before proceeding. > > Thanks, > Aihua > > On Sun, Aug 17, 2025 at 9:28 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> > >> > You want to see if the write path in GO is compatible? Let >> > me check with Matt on this. >> >> >> Yes, IIUC, I think there are now multiple OSS reader implementations, that >> have all been validated against parquet-java writing. So I think it is >> important we validate a second writer can produce files that can be read >> by >> parquet-java. >> >> Thanks, >> Micah >> >> On Mon, Aug 11, 2025 at 9:17 AM Aihua Xu <aihu...@gmail.com> wrote: >> >> > Hi Micah, >> > >> > What we have done is to generate a large set of the test cases from the >> > Iceberg project and validate in Java and GO. All of those >> implementations >> > are independent. You want to see if the write path in GO is compatible? >> Let >> > me check with Matt on this. >> > >> > Thanks, >> > Aihua >> > >> > On Sun, Aug 10, 2025 at 9:24 PM Micah Kornfield <emkornfi...@gmail.com> >> > wrote: >> > >> > > > >> > > > We have completed cross-language validation for variant and the >> > > > implementation compatibility appears solid >> > > >> > > >> > > Great, apologies if I missed it but did we verify Java being able to >> read >> > > Go's output? >> > > >> > > On Fri, Aug 8, 2025 at 9:38 PM Aihua Xu <aihu...@gmail.com> wrote: >> > > >> > > > We have completed cross-language validation for variant and the >> > > > implementation compatibility appears solid. Matt has raised some >> > comments >> > > > regarding how to handle invalid cases. In fact, we had a long >> > discussion >> > > > during the spec development about whether to explicitly define the >> > > behavior >> > > > for such cases. We should be able to clear that out soon. >> > > > >> > > > >> > > > > On Aug 8, 2025, at 2:35 PM, Jia Yu <ji...@apache.org> wrote: >> > > > > >> > > > > Hi Gang, >> > > > > >> > > > > Thanks for letting me know. >> > > > > >> > > > > Would it make sense to create a new Parquet Java branch that >> includes >> > > all >> > > > > other commits except the Variant type implementation? That way, we >> > > could >> > > > > release a version without Variant entirely. >> > > > > >> > > > > We’re eager to get the Geo type released, but at the same time, we >> > > don’t >> > > > > want to rush the Variant work or ship something that’s not fully >> > ready. >> > > > > >> > > > > Thanks, >> > > > > Jia >> > > > > >> > > > >> On Fri, Aug 8, 2025 at 1:25 AM Gang Wu <ust...@gmail.com> wrote: >> > > > >> >> > > > >> parquet-cpp does not implement variant type yet, so it is safe to >> > > > release >> > > > >> the geo types. IIUC, there is no easy way to block users from >> > > producing >> > > > >> files with variant types in parquet-java, so this is the main >> > concern. >> > > > >> >> > > > >> Perhaps Aihua can provide an update on the progress? >> > > > >> >> > > > >> Best, >> > > > >> Gang >> > > > >> >> > > > >> >> > > > >> >> > > > >>> On Fri, Aug 8, 2025 at 5:11 AM Jia Yu <ji...@apache.org> wrote: >> > > > >>> >> > > > >>> Hi all, >> > > > >>> >> > > > >>> Thank you for all your hard work on Parquet. >> > > > >>> >> > > > >>> Sorry for my ignorance, but I’d like to better understand why >> the >> > > > Parquet >> > > > >>> Java release for Geo types is currently tied to the Variant type >> > > work. >> > > > >>> Arrow C++ (Parquet C++) has already been released with Geo type >> > > > support, >> > > > >>> and it doesn’t seem to have encountered similar issues. >> > > > >>> >> > > > >>> The Geo type support in Iceberg has been stalled for several >> months >> > > > >> because >> > > > >>> the Iceberg PMC cannot review or merge the implementation until >> > > > there’s a >> > > > >>> corresponding Parquet Java release. >> > > > >>> >> > > > >>> Would it be possible to proceed with a new Parquet Java release >> for >> > > > Geo, >> > > > >>> and mark the Variant type as experimental or keep it behind a >> > feature >> > > > >> flag? >> > > > >>> >> > > > >>> I’d really appreciate your thoughts on this and am looking >> forward >> > to >> > > > >> your >> > > > >>> response. >> > > > >>> >> > > > >>> Thanks, >> > > > >>> Jia >> > > > >>> >> > > > >>> >> > > > >>> >> > > > >>>> On Fri, Jul 18, 2025 at 10:33 AM Aihua Xu <aihu...@gmail.com> >> > > wrote: >> > > > >>> >> > > > >>>> Seems the concern from Gabor is that we should finalize the >> > Variant >> > > > >> spec >> > > > >>> ( >> > > > >>>> >> > > > >> >> > > >> https://github.com/apache/parquet-format/blob/master/VariantEncoding.md >> > > > >>>> and >> > > > >>>> >> > > > >> >> > > > >> > >> https://github.com/apache/parquet-format/blob/master/VariantShredding.md >> > > > >>> ), >> > > > >>>> have a parquet-format release, and then move forward with >> > > parquet-java >> > > > >>>> release. I totally agree. >> > > > >>>> >> > > > >>>> We should have met the requirement with two reference >> > > implementations >> > > > >> for >> > > > >>>> Variant in open source and I will start a VOTE thread >> separately >> > to >> > > > >> close >> > > > >>>> out the Variant spec if no objections. >> > > > >>>> >> > > > >>>> Thanks for the discussions. >> > > > >>>> Aihua >> > > > >>>> >> > > > >>>> >> > > > >>>> On Thu, Jul 17, 2025 at 3:41 AM Andrew Lamb < >> > andrewlam...@gmail.com >> > > > >> > > > >>>> wrote: >> > > > >>>> >> > > > >>>>>> At this point, I’d like to check if we have enough >> > implementation >> > > > >>>>> coverage >> > > > >>>>>> to move forward with finalizing the Variant spec. Would it >> make >> > > > >> sense >> > > > >>>> to >> > > > >>>>>> start a vote thread at this stage? >> > > > >>>>> >> > > > >>>>> In my opinion we have sufficient open source implementations >> (the >> > > > >>> Golang >> > > > >>>>> implementation on arrow-go) and a vote to finalize the spec >> would >> > > be >> > > > >>>>> appropriate (and welcome) >> > > > >>>>> >> > > > >>>>> From my experience working on the Rust implementation so far, >> I >> > > have >> > > > >>>> found >> > > > >>>>> the spec clear and easy to understand, the design well thought >> > out, >> > > > >> and >> > > > >>>>> have not encountered anything that would require any changes. >> > > > >>>>> >> > > > >>>>> Kudos to the team who designed and wrote the spec for this >> > feature, >> > > > >>>>> Andrew >> > > > >>>>> >> > > > >>>>> >> > > > >>>>> >> > > > >>>>> On Thu, Jul 17, 2025 at 2:08 AM Jia Yu <ji...@apache.org> >> wrote: >> > > > >>>>> >> > > > >>>>>> Thanks Aihua! >> > > > >>>>>> >> > > > >>>>>> The geo type implementation in Iceberg is currently blocked >> by >> > > this >> > > > >>>>>> release. Really looking forward to it. >> > > > >>>>>> >> > > > >>>>>> Jia >> > > > >>>>>> >> > > > >>>>>> On Wed, Jul 16, 2025 at 10:47 PM Gábor Szádovszky < >> > > > >> ga...@apache.org> >> > > > >>>>>> wrote: >> > > > >>>>>> >> > > > >>>>>>> My concern was related to the current stage of the Variant >> > > > >>>>> specification >> > > > >>>>>>> and the fact that we started talking about releasing >> > parquet-java >> > > > >>>> with >> > > > >>>>>>> Variant features. >> > > > >>>>>>> If we formally release parquet-format with the finalized >> > Variant >> > > > >>> spec >> > > > >>>>>>> first, then I have no concerns about writing Variant values >> in >> > > > >> the >> > > > >>>>>> upcoming >> > > > >>>>>>> parquet-java release. Otherwise, we need to block it by >> default >> > > > >> and >> > > > >>>>> mark >> > > > >>>>>> it >> > > > >>>>>>> as an experimental feature. >> > > > >>>>>>> >> > > > >>>>>>> Cheers, >> > > > >>>>>>> Gabor >> > > > >>>>>>> >> > > > >>>>>>> Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2025. júl. >> > 16., >> > > > >>> Sze, >> > > > >>>>>>> 19:37): >> > > > >>>>>>> >> > > > >>>>>>>> Hi Gabor and all, >> > > > >>>>>>>> >> > > > >>>>>>>> Here’s my current understanding of the progress on the >> > > > >> *Variant* >> > > > >>>>>> support >> > > > >>>>>>> in >> > > > >>>>>>>> Parquet: >> > > > >>>>>>>> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Per Parquet's requirements, we need at least two >> reference >> > > > >>>>>>>> implementations to finalize the Variant logical type >> > > > >>>>> specification. >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> The community is actively working on Java, Go, and Rust >> > > > >>>>>>> implementations: >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Java already has the encoding and shredding >> > > > >> implementations >> > > > >>>> in >> > > > >>>>>>> place: >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Variant Decoding < >> > > > >>>>>>>> https://github.com/apache/parquet-java/pull/3197> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Variant Encoding < >> > > > >>>>>>>> https://github.com/apache/parquet-java/pull/3202> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Variant Shredding Writer >> > > > >>>>>>>> < >> https://github.com/apache/parquet-java/issues/3223> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Variant Shredding Reader >> > > > >>>>>>>> < >> https://github.com/apache/parquet-java/issues/3211> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Go also includes encoding and shredding support: >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Variant Encoding/Decoding >> > > > >>>>>>>> <https://github.com/apache/arrow-go/pull/344> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Variant Shredding < >> > > > >>>>>> https://github.com/apache/arrow-go/pull/434> >> > > > >>>>>>>> - >> > > > >>>>>>>> >> > > > >>>>>>>> Rust is currently working on the shredding >> > > > >> implementation. >> > > > >>>>>>>> >> > > > >>>>>>>> In addition to these, we already have a full Variant >> > > > >>> implementation >> > > > >>>>> in >> > > > >>>>>>>> Apache Iceberg, as well as in some closed-source engines. >> > > > >>>>>>>> >> > > > >>>>>>>> At this point, I’d like to check if we have enough >> > > > >> implementation >> > > > >>>>>>> coverage >> > > > >>>>>>>> to move forward with finalizing the Variant spec. Would it >> > make >> > > > >>>> sense >> > > > >>>>>> to >> > > > >>>>>>>> start a vote thread at this stage? >> > > > >>>>>>>> >> > > > >>>>>>>> Ultimately, our goal is to release a new version of >> > > > >>> parquet-format >> > > > >>>>> and >> > > > >>>>>>>> parquet-java that includes the Variant logical type, so >> that >> > > > >>>> Iceberg >> > > > >>>>>> and >> > > > >>>>>>>> other engines can officially depend on it and proceed with >> > > > >>> further >> > > > >>>>>>>> implementation. >> > > > >>>>>>>> >> > > > >>>>>>>> Let me know your thoughts and how we should proceed. >> > > > >>>>>>>> >> > > > >>>>>>>> Thanks, >> > > > >>>>>>>> >> > > > >>>>>>>> Aihua >> > > > >>>>>>>> >> > > > >>>>>>>> On Sun, Jul 13, 2025 at 10:08 PM Gábor Szádovszky < >> > > > >>>> ga...@apache.org> >> > > > >>>>>>>> wrote: >> > > > >>>>>>>> >> > > > >>>>>>>>> Hi, >> > > > >>>>>>>>> >> > > > >>>>>>>>> I was not able to open the recordings of the last meeting >> > > > >>> because >> > > > >>>>> of >> > > > >>>>>>>>> permission issues. (Shouldn't these be accessible for >> > > > >> anyone?) >> > > > >>>>>>>>> So, I'm not sure if you have talked about this, but the >> > > > >> Variant >> > > > >>>>> spec >> > > > >>>>>> is >> > > > >>>>>>>>> still not final. Since parquet-java already has Variant >> > > > >>> support, >> > > > >>>>> how >> > > > >>>>>> do >> > > > >>>>>>>> we >> > > > >>>>>>>>> prevent writing potentially invalid Variant data with the >> > > > >>> proper >> > > > >>>>>>> logical >> > > > >>>>>>>>> types we will use for the finalized spec? Is it behind a >> > > > >>> feature >> > > > >>>>>> flag? >> > > > >>>>>>>>> >> > > > >>>>>>>>> Cheers, >> > > > >>>>>>>>> Gabor >> > > > >>>>>>>>> >> > > > >>>>>>>>> Aihua Xu <aihu...@gmail.com> ezt írta (időpont: 2025. >> júl. >> > > > >>> 11., >> > > > >>>> P, >> > > > >>>>>>>> 19:33): >> > > > >>>>>>>>> >> > > > >>>>>>>>>> Hi community, >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> As discussed in the last community sync-up meeting, I'd >> > > > >> like >> > > > >>> to >> > > > >>>>>>> proceed >> > > > >>>>>>>>>> with releasing *Parquet-Java 1.16.0*, which will include >> > > > >>>> support >> > > > >>>>>> for >> > > > >>>>>>>>>> *geo-type* and *variant*. >> > > > >>>>>>>>>> >> > > > >>>>>>>>>> Please let me know if you have any objections or if you >> > > > >> have >> > > > >>>> any >> > > > >>>>>>>> upcoming >> > > > >>>>>>>>>> changes you'd like to include in this release. >> > > > >>>>>>>>>> Thanks, >> > > > >>>>>>>>>> Aihua >> > > > >>>>>>>>>> >> > > > >>>>>>>>> >> > > > >>>>>>>> >> > > > >>>>>>> >> > > > >>>>>> >> > > > >>>>> >> > > > >>>> >> > > > >>> >> > > > >> >> > > > >> > > >> > >> >