Yes, Micah updated it yesterday and it looks good to me. I'll merge it.

On Wed, Mar 19, 2025 at 11:47 PM Gang Wu <ust...@gmail.com> wrote:

> Should we merge https://github.com/apache/parquet-format/pull/474 before
> releasing the format?
>
> On Thu, Mar 20, 2025 at 12:52 AM Fokko Driesprong <fo...@apache.org>
> wrote:
>
> > As mentioned earlier, it looks like the implementations
> > <https://lists.apache.org/thread/361035vf227w11q6r6w6t67618jlpfbx> are
> > moving forward. I think it would be good to have the format released, as
> > long as we're all comfortable with the current state to unblock the other
> > releases.
> >
> > Kind regards,
> > Fokko
> >
> >
> >
> > Op wo 19 mrt 2025 om 04:15 schreef Gang Wu <ust...@gmail.com>:
> >
> > > I'm fine with releasing the parquet-format to unblock variant (and
> > > geometry) development.
> > >
> > > We can add a statement to the documentation and release note to warn
> that
> > > this is experimental and subject to change.
> > >
> > > Best,
> > > Gang
> > >
> > > On Wed, Mar 19, 2025 at 7:08 AM Aihua Xu <aihu...@gmail.com> wrote:
> > >
> > > > Hi Ryan,
> > > >
> > > > Thanks a lot for explaining the process to add Variant type
> annotation
> > in
> > > > Java. It would be great if we can add the logical type now to unblock
> > > some
> > > > code paths which rely on such logical types.
> > > >
> > > > On Tue, Mar 18, 2025 at 1:22 PM Ryan Blue <rdb...@gmail.com> wrote:
> > > >
> > > > > Hi everyone,
> > > > > Last community sync, we had a few Variant topics that I want to
> also
> > > > raise
> > > > > here.
> > > > >
> > > > > First, I want to highlight that we talked about spec versioning and
> > > there
> > > > > was agreement to avoid unnecessary complexity by maintaining one
> > > version
> > > > > that covers both encoding and shredding. The two parts of Variant
> are
> > > in
> > > > > separate markdown docs, but we don’t want to have fragmentation
> where
> > > an
> > > > > implementation follows the encoding but not shredding. We want to
> > keep
> > > > the
> > > > > two fully compatible and ensure that you can rely on shredding for
> > data
> > > > > skipping. Note that this doesn’t add many requirements for writers
> > > > > (shredding is optional), but ensures that shredding can be used by
> > > > > requiring support in readers. This wasn’t very controversial, but
> > > please
> > > > > reply if you have concerns about it.
> > > > >
> > > > > Second, in the sync I also brought up the need to get the Variant
> > type
> > > > > annotation into a release. There was some pushback on this, I
> think,
> > > > > because non-Java projects have a simpler build process. I don’t
> think
> > > > many
> > > > > people realized how the Java side works:
> > > > >
> > > > >    - The thrift file is released in the parquet-format Jar
> > > > >    - The parquet-java maven build has a parquet-format-structures
> > > module
> > > > >    that downloads the parquet-format Jar and generates Java classes
> > > from
> > > > > the
> > > > >    Thrift definition
> > > > >    - Then there is a translation step in ParquetMetadataConverter
> > that
> > > > >    converts to Parquet (rather than Thrift) classes
> > > > >    - The Parquet API adds utilities to use the converted metadata
> > > > classes,
> > > > >    like LogicalTypeAnnotationVisitor
> > > > >
> > > > > We discussed creating sample files to verify Variant compatibility,
> > but
> > > > in
> > > > > order to produce those files from the Variant implementation I’m
> > > > building,
> > > > > I would need to produce a one-off version of the thrift definition,
> > > > inject
> > > > > it into the parquet-thrift-format-structures build, update the
> > Parquet
> > > > API
> > > > > to add a VariantLogicalTypeAnnotation, and then find a way to get
> > that
> > > > > temporary build into a Maven repository so that PRs can depend on
> it.
> > > > Those
> > > > > PRs can’t be merged because they would rely on an
> > unpublished/snapshot
> > > > Jar.
> > > > >
> > > > > This is becoming a big headache and I don’t think that it helps us
> > > > deliver
> > > > > Variant reliably. We know that we are going to support Variant data
> > and
> > > > the
> > > > > type annotation to label that data is going to be needed. There is
> > > little
> > > > > risk in committing that annotation now. The argument against adding
> > it
> > > > now
> > > > > was that it may create confusion, but I think it is unlikely that
> > > anyone
> > > > > would see the annotation in parquet-java or parquet-format and
> assume
> > > > > support guarantees. The spec is clearly marked experimental and the
> > > > > annotation is referenced already by LogicalTypes.md
> > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#variant
> > > > > >.
> > > > > Adding the annotation to the API doesn’t significantly increase the
> > > > number
> > > > > of people seeing it, and the people that do are already working
> with
> > > > > Parquet internals.
> > > > >
> > > > > The only other objection I know about is that we haven’t yet
> > finalized
> > > > what
> > > > > goes in the annotation, which I agree is a prerequisite for moving
> > > > forward.
> > > > > I suggest that we add the encoding/shredding version (1), release
> > > > > parquet-format, and start working on the API extensions in
> > > parquet-java.
> > > > > Then we can actually work on a Java implementation that stores data
> > in
> > > > > files in the parquet-java project.
> > > > >
> > > > > Does that sound reasonable?
> > > > >
> > > > > Ryan
> > > > >
> > > >
> > >
> >
>

Reply via email to