Thanks a lot for driving this forward. Let me know if I can help in any way.

On Thu, Mar 20, 2025 at 8:33 AM Ryan Blue <rdb...@gmail.com> wrote:

> Sounds great! Thanks, Gang!
>
> On Thu, Mar 20, 2025 at 8:32 AM Gang Wu <ust...@gmail.com> wrote:
>
> > Thanks Ryan! If there is no objection, I'll go ahead to create the
> > parquet-format 2.11.0 RC0 tomorrow.
> >
> > On Thu, Mar 20, 2025 at 11:27 PM Ryan Blue <rdb...@gmail.com> wrote:
> >
> > > Yes, Micah updated it yesterday and it looks good to me. I'll merge it.
> > >
> > > On Wed, Mar 19, 2025 at 11:47 PM Gang Wu <ust...@gmail.com> wrote:
> > >
> > > > Should we merge https://github.com/apache/parquet-format/pull/474
> > before
> > > > releasing the format?
> > > >
> > > > On Thu, Mar 20, 2025 at 12:52 AM Fokko Driesprong <fo...@apache.org>
> > > > wrote:
> > > >
> > > > > As mentioned earlier, it looks like the implementations
> > > > > <https://lists.apache.org/thread/361035vf227w11q6r6w6t67618jlpfbx>
> > are
> > > > > moving forward. I think it would be good to have the format
> released,
> > > as
> > > > > long as we're all comfortable with the current state to unblock the
> > > other
> > > > > releases.
> > > > >
> > > > > Kind regards,
> > > > > Fokko
> > > > >
> > > > >
> > > > >
> > > > > Op wo 19 mrt 2025 om 04:15 schreef Gang Wu <ust...@gmail.com>:
> > > > >
> > > > > > I'm fine with releasing the parquet-format to unblock variant
> (and
> > > > > > geometry) development.
> > > > > >
> > > > > > We can add a statement to the documentation and release note to
> > warn
> > > > that
> > > > > > this is experimental and subject to change.
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > > > On Wed, Mar 19, 2025 at 7:08 AM Aihua Xu <aihu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi Ryan,
> > > > > > >
> > > > > > > Thanks a lot for explaining the process to add Variant type
> > > > annotation
> > > > > in
> > > > > > > Java. It would be great if we can add the logical type now to
> > > unblock
> > > > > > some
> > > > > > > code paths which rely on such logical types.
> > > > > > >
> > > > > > > On Tue, Mar 18, 2025 at 1:22 PM Ryan Blue <rdb...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > > Last community sync, we had a few Variant topics that I want
> to
> > > > also
> > > > > > > raise
> > > > > > > > here.
> > > > > > > >
> > > > > > > > First, I want to highlight that we talked about spec
> versioning
> > > and
> > > > > > there
> > > > > > > > was agreement to avoid unnecessary complexity by maintaining
> > one
> > > > > > version
> > > > > > > > that covers both encoding and shredding. The two parts of
> > Variant
> > > > are
> > > > > > in
> > > > > > > > separate markdown docs, but we don’t want to have
> fragmentation
> > > > where
> > > > > > an
> > > > > > > > implementation follows the encoding but not shredding. We
> want
> > to
> > > > > keep
> > > > > > > the
> > > > > > > > two fully compatible and ensure that you can rely on
> shredding
> > > for
> > > > > data
> > > > > > > > skipping. Note that this doesn’t add many requirements for
> > > writers
> > > > > > > > (shredding is optional), but ensures that shredding can be
> used
> > > by
> > > > > > > > requiring support in readers. This wasn’t very controversial,
> > but
> > > > > > please
> > > > > > > > reply if you have concerns about it.
> > > > > > > >
> > > > > > > > Second, in the sync I also brought up the need to get the
> > Variant
> > > > > type
> > > > > > > > annotation into a release. There was some pushback on this, I
> > > > think,
> > > > > > > > because non-Java projects have a simpler build process. I
> don’t
> > > > think
> > > > > > > many
> > > > > > > > people realized how the Java side works:
> > > > > > > >
> > > > > > > >    - The thrift file is released in the parquet-format Jar
> > > > > > > >    - The parquet-java maven build has a
> > parquet-format-structures
> > > > > > module
> > > > > > > >    that downloads the parquet-format Jar and generates Java
> > > classes
> > > > > > from
> > > > > > > > the
> > > > > > > >    Thrift definition
> > > > > > > >    - Then there is a translation step in
> > ParquetMetadataConverter
> > > > > that
> > > > > > > >    converts to Parquet (rather than Thrift) classes
> > > > > > > >    - The Parquet API adds utilities to use the converted
> > metadata
> > > > > > > classes,
> > > > > > > >    like LogicalTypeAnnotationVisitor
> > > > > > > >
> > > > > > > > We discussed creating sample files to verify Variant
> > > compatibility,
> > > > > but
> > > > > > > in
> > > > > > > > order to produce those files from the Variant implementation
> > I’m
> > > > > > > building,
> > > > > > > > I would need to produce a one-off version of the thrift
> > > definition,
> > > > > > > inject
> > > > > > > > it into the parquet-thrift-format-structures build, update
> the
> > > > > Parquet
> > > > > > > API
> > > > > > > > to add a VariantLogicalTypeAnnotation, and then find a way to
> > get
> > > > > that
> > > > > > > > temporary build into a Maven repository so that PRs can
> depend
> > on
> > > > it.
> > > > > > > Those
> > > > > > > > PRs can’t be merged because they would rely on an
> > > > > unpublished/snapshot
> > > > > > > Jar.
> > > > > > > >
> > > > > > > > This is becoming a big headache and I don’t think that it
> helps
> > > us
> > > > > > > deliver
> > > > > > > > Variant reliably. We know that we are going to support
> Variant
> > > data
> > > > > and
> > > > > > > the
> > > > > > > > type annotation to label that data is going to be needed.
> There
> > > is
> > > > > > little
> > > > > > > > risk in committing that annotation now. The argument against
> > > adding
> > > > > it
> > > > > > > now
> > > > > > > > was that it may create confusion, but I think it is unlikely
> > that
> > > > > > anyone
> > > > > > > > would see the annotation in parquet-java or parquet-format
> and
> > > > assume
> > > > > > > > support guarantees. The spec is clearly marked experimental
> and
> > > the
> > > > > > > > annotation is referenced already by LogicalTypes.md
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#variant
> > > > > > > > >.
> > > > > > > > Adding the annotation to the API doesn’t significantly
> increase
> > > the
> > > > > > > number
> > > > > > > > of people seeing it, and the people that do are already
> working
> > > > with
> > > > > > > > Parquet internals.
> > > > > > > >
> > > > > > > > The only other objection I know about is that we haven’t yet
> > > > > finalized
> > > > > > > what
> > > > > > > > goes in the annotation, which I agree is a prerequisite for
> > > moving
> > > > > > > forward.
> > > > > > > > I suggest that we add the encoding/shredding version (1),
> > release
> > > > > > > > parquet-format, and start working on the API extensions in
> > > > > > parquet-java.
> > > > > > > > Then we can actually work on a Java implementation that
> stores
> > > data
> > > > > in
> > > > > > > > files in the parquet-java project.
> > > > > > > >
> > > > > > > > Does that sound reasonable?
> > > > > > > >
> > > > > > > > Ryan
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to