I'm fine with releasing the parquet-format to unblock variant (and
geometry) development.

We can add a statement to the documentation and release note to warn that
this is experimental and subject to change.

Best,
Gang

On Wed, Mar 19, 2025 at 7:08 AM Aihua Xu <aihu...@gmail.com> wrote:

> Hi Ryan,
>
> Thanks a lot for explaining the process to add Variant type annotation in
> Java. It would be great if we can add the logical type now to unblock some
> code paths which rely on such logical types.
>
> On Tue, Mar 18, 2025 at 1:22 PM Ryan Blue <rdb...@gmail.com> wrote:
>
> > Hi everyone,
> > Last community sync, we had a few Variant topics that I want to also
> raise
> > here.
> >
> > First, I want to highlight that we talked about spec versioning and there
> > was agreement to avoid unnecessary complexity by maintaining one version
> > that covers both encoding and shredding. The two parts of Variant are in
> > separate markdown docs, but we don’t want to have fragmentation where an
> > implementation follows the encoding but not shredding. We want to keep
> the
> > two fully compatible and ensure that you can rely on shredding for data
> > skipping. Note that this doesn’t add many requirements for writers
> > (shredding is optional), but ensures that shredding can be used by
> > requiring support in readers. This wasn’t very controversial, but please
> > reply if you have concerns about it.
> >
> > Second, in the sync I also brought up the need to get the Variant type
> > annotation into a release. There was some pushback on this, I think,
> > because non-Java projects have a simpler build process. I don’t think
> many
> > people realized how the Java side works:
> >
> >    - The thrift file is released in the parquet-format Jar
> >    - The parquet-java maven build has a parquet-format-structures module
> >    that downloads the parquet-format Jar and generates Java classes from
> > the
> >    Thrift definition
> >    - Then there is a translation step in ParquetMetadataConverter that
> >    converts to Parquet (rather than Thrift) classes
> >    - The Parquet API adds utilities to use the converted metadata
> classes,
> >    like LogicalTypeAnnotationVisitor
> >
> > We discussed creating sample files to verify Variant compatibility, but
> in
> > order to produce those files from the Variant implementation I’m
> building,
> > I would need to produce a one-off version of the thrift definition,
> inject
> > it into the parquet-thrift-format-structures build, update the Parquet
> API
> > to add a VariantLogicalTypeAnnotation, and then find a way to get that
> > temporary build into a Maven repository so that PRs can depend on it.
> Those
> > PRs can’t be merged because they would rely on an unpublished/snapshot
> Jar.
> >
> > This is becoming a big headache and I don’t think that it helps us
> deliver
> > Variant reliably. We know that we are going to support Variant data and
> the
> > type annotation to label that data is going to be needed. There is little
> > risk in committing that annotation now. The argument against adding it
> now
> > was that it may create confusion, but I think it is unlikely that anyone
> > would see the annotation in parquet-java or parquet-format and assume
> > support guarantees. The spec is clearly marked experimental and the
> > annotation is referenced already by LogicalTypes.md
> > <
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#variant
> > >.
> > Adding the annotation to the API doesn’t significantly increase the
> number
> > of people seeing it, and the people that do are already working with
> > Parquet internals.
> >
> > The only other objection I know about is that we haven’t yet finalized
> what
> > goes in the annotation, which I agree is a prerequisite for moving
> forward.
> > I suggest that we add the encoding/shredding version (1), release
> > parquet-format, and start working on the API extensions in parquet-java.
> > Then we can actually work on a Java implementation that stores data in
> > files in the parquet-java project.
> >
> > Does that sound reasonable?
> >
> > Ryan
> >
>

Reply via email to