Thanks a lot for driving this forward. Let me know if I can help in any way.
On Thu, Mar 20, 2025 at 8:33 AM Ryan Blue <rdb...@gmail.com> wrote: > Sounds great! Thanks, Gang! > > On Thu, Mar 20, 2025 at 8:32 AM Gang Wu <ust...@gmail.com> wrote: > > > Thanks Ryan! If there is no objection, I'll go ahead to create the > > parquet-format 2.11.0 RC0 tomorrow. > > > > On Thu, Mar 20, 2025 at 11:27 PM Ryan Blue <rdb...@gmail.com> wrote: > > > > > Yes, Micah updated it yesterday and it looks good to me. I'll merge it. > > > > > > On Wed, Mar 19, 2025 at 11:47 PM Gang Wu <ust...@gmail.com> wrote: > > > > > > > Should we merge https://github.com/apache/parquet-format/pull/474 > > before > > > > releasing the format? > > > > > > > > On Thu, Mar 20, 2025 at 12:52 AM Fokko Driesprong <fo...@apache.org> > > > > wrote: > > > > > > > > > As mentioned earlier, it looks like the implementations > > > > > <https://lists.apache.org/thread/361035vf227w11q6r6w6t67618jlpfbx> > > are > > > > > moving forward. I think it would be good to have the format > released, > > > as > > > > > long as we're all comfortable with the current state to unblock the > > > other > > > > > releases. > > > > > > > > > > Kind regards, > > > > > Fokko > > > > > > > > > > > > > > > > > > > > Op wo 19 mrt 2025 om 04:15 schreef Gang Wu <ust...@gmail.com>: > > > > > > > > > > > I'm fine with releasing the parquet-format to unblock variant > (and > > > > > > geometry) development. > > > > > > > > > > > > We can add a statement to the documentation and release note to > > warn > > > > that > > > > > > this is experimental and subject to change. > > > > > > > > > > > > Best, > > > > > > Gang > > > > > > > > > > > > On Wed, Mar 19, 2025 at 7:08 AM Aihua Xu <aihu...@gmail.com> > > wrote: > > > > > > > > > > > > > Hi Ryan, > > > > > > > > > > > > > > Thanks a lot for explaining the process to add Variant type > > > > annotation > > > > > in > > > > > > > Java. It would be great if we can add the logical type now to > > > unblock > > > > > > some > > > > > > > code paths which rely on such logical types. > > > > > > > > > > > > > > On Tue, Mar 18, 2025 at 1:22 PM Ryan Blue <rdb...@gmail.com> > > > wrote: > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > Last community sync, we had a few Variant topics that I want > to > > > > also > > > > > > > raise > > > > > > > > here. > > > > > > > > > > > > > > > > First, I want to highlight that we talked about spec > versioning > > > and > > > > > > there > > > > > > > > was agreement to avoid unnecessary complexity by maintaining > > one > > > > > > version > > > > > > > > that covers both encoding and shredding. The two parts of > > Variant > > > > are > > > > > > in > > > > > > > > separate markdown docs, but we don’t want to have > fragmentation > > > > where > > > > > > an > > > > > > > > implementation follows the encoding but not shredding. We > want > > to > > > > > keep > > > > > > > the > > > > > > > > two fully compatible and ensure that you can rely on > shredding > > > for > > > > > data > > > > > > > > skipping. Note that this doesn’t add many requirements for > > > writers > > > > > > > > (shredding is optional), but ensures that shredding can be > used > > > by > > > > > > > > requiring support in readers. This wasn’t very controversial, > > but > > > > > > please > > > > > > > > reply if you have concerns about it. > > > > > > > > > > > > > > > > Second, in the sync I also brought up the need to get the > > Variant > > > > > type > > > > > > > > annotation into a release. There was some pushback on this, I > > > > think, > > > > > > > > because non-Java projects have a simpler build process. I > don’t > > > > think > > > > > > > many > > > > > > > > people realized how the Java side works: > > > > > > > > > > > > > > > > - The thrift file is released in the parquet-format Jar > > > > > > > > - The parquet-java maven build has a > > parquet-format-structures > > > > > > module > > > > > > > > that downloads the parquet-format Jar and generates Java > > > classes > > > > > > from > > > > > > > > the > > > > > > > > Thrift definition > > > > > > > > - Then there is a translation step in > > ParquetMetadataConverter > > > > > that > > > > > > > > converts to Parquet (rather than Thrift) classes > > > > > > > > - The Parquet API adds utilities to use the converted > > metadata > > > > > > > classes, > > > > > > > > like LogicalTypeAnnotationVisitor > > > > > > > > > > > > > > > > We discussed creating sample files to verify Variant > > > compatibility, > > > > > but > > > > > > > in > > > > > > > > order to produce those files from the Variant implementation > > I’m > > > > > > > building, > > > > > > > > I would need to produce a one-off version of the thrift > > > definition, > > > > > > > inject > > > > > > > > it into the parquet-thrift-format-structures build, update > the > > > > > Parquet > > > > > > > API > > > > > > > > to add a VariantLogicalTypeAnnotation, and then find a way to > > get > > > > > that > > > > > > > > temporary build into a Maven repository so that PRs can > depend > > on > > > > it. > > > > > > > Those > > > > > > > > PRs can’t be merged because they would rely on an > > > > > unpublished/snapshot > > > > > > > Jar. > > > > > > > > > > > > > > > > This is becoming a big headache and I don’t think that it > helps > > > us > > > > > > > deliver > > > > > > > > Variant reliably. We know that we are going to support > Variant > > > data > > > > > and > > > > > > > the > > > > > > > > type annotation to label that data is going to be needed. > There > > > is > > > > > > little > > > > > > > > risk in committing that annotation now. The argument against > > > adding > > > > > it > > > > > > > now > > > > > > > > was that it may create confusion, but I think it is unlikely > > that > > > > > > anyone > > > > > > > > would see the annotation in parquet-java or parquet-format > and > > > > assume > > > > > > > > support guarantees. The spec is clearly marked experimental > and > > > the > > > > > > > > annotation is referenced already by LogicalTypes.md > > > > > > > > < > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#variant > > > > > > > > >. > > > > > > > > Adding the annotation to the API doesn’t significantly > increase > > > the > > > > > > > number > > > > > > > > of people seeing it, and the people that do are already > working > > > > with > > > > > > > > Parquet internals. > > > > > > > > > > > > > > > > The only other objection I know about is that we haven’t yet > > > > > finalized > > > > > > > what > > > > > > > > goes in the annotation, which I agree is a prerequisite for > > > moving > > > > > > > forward. > > > > > > > > I suggest that we add the encoding/shredding version (1), > > release > > > > > > > > parquet-format, and start working on the API extensions in > > > > > > parquet-java. > > > > > > > > Then we can actually work on a Java implementation that > stores > > > data > > > > > in > > > > > > > > files in the parquet-java project. > > > > > > > > > > > > > > > > Does that sound reasonable? > > > > > > > > > > > > > > > > Ryan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >