Thanks Fokko for bringing this up! Yes, I can be the release manager
if the community reaches a consensus.

Best,
Gang

On Mon, Feb 17, 2025 at 6:58 PM Fokko Driesprong <fo...@apache.org> wrote:

> Hey everyone,
>
> I would love to bubble this back up to the top of our mailboxes.
>
>    - For Variant, various implementations are in flight: Java in
>    Parquet-Java <https://github.com/apache/parquet-java/pull/3117> and
>    Iceberg-Java <https://github.com/apache/iceberg/pull/12139>, C++
>    <https://github.com/apache/arrow/pull/45375> in Arrow, Python
>    <
> https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py
> >
>    in Spark, and the Arrow Rust community also expressed interest
>    <https://github.com/apache/arrow-rs/issues/6736>.
>    - For Geometry/Geography, we see a C++ PR
>    <https://github.com/apache/arrow/pull/45459> in Arrow, Java in Parquet
>    <https://github.com/apache/parquet-java/pull/2971>, but the vote has
>    just passed last week. We also see that geo support has been added to
>    Iceberg <https://github.com/apache/iceberg/pull/10981>.
>
> Both Variant and Geo have been voted for and merged in the format spec. To
> maintain momentum I think it would be good to get the thrift definitions
> and the Java convenience JAR out.
>
> Does anyone have any questions or concerns about getting this out? Gang,
> you mentioned that you would like to volunteer as release manager, are
> you still available? :)
>
> Kind regards,
> Fokko
>
>
> Op do 5 dec 2024 om 05:33 schreef Gene Pang <gene.p...@gmail.com>:
>
> > I see, thanks for the clarifications!
> >
> > I will work on porting the Spark Java implementation to parquet-java.
> >
> > Spark also has a (partial) python implementation for the Variant binary
> > format, but it needs a bit more work to complete.
> >
> > Thanks,
> > Gene
> >
> > On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <andrewlam...@gmail.com>
> wrote:
> >
> > > We also were discussing trying to implement variant in Rust[1], but it
> > was
> > > hard due to a lack of other implementations or example data to test
> > against
> > >
> > > Maybe once there is a draft POC for Java, we could whip up something
> for
> > > Rust that did the same
> > >
> > > [1]: https://github.com/apache/arrow-rs/issues/6736
> > >
> > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <ust...@gmail.com> wrote:
> > >
> > > > > With regards to Variant implementations, for Java, don't we need
> the
> > > > format
> > > > > released before the implementation can be provided (I thought
> > > > parquet-java
> > > > > consumed a released parquet-format jar in its build)?
> > > >
> > > > For parquet-java, usually the PoC PR is based on a locally built
> > > > parquet-format
> > > > with an unreleased version when the spec change is under review. Once
> > the
> > > > vote
> > > > has been passed and a new parquet-format is released, the PoC PR gets
> > > > rebased
> > > > on the released format for a final review. Below are some examples:
> > > >
> > > > float16: https://github.com/apache/parquet-java/pull/1142
> > > > size stats: https://github.com/apache/parquet-java/pull/1177
> > > > geometry: https://github.com/apache/parquet-java/pull/2971
> > > >
> > > > Best,
> > > > Gang
> > > >
> > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Gene,
> > > > >
> > > > > Before release, I added a proposal to have a shredding version
> added
> > to
> > > > the
> > > > > annotation (https://github.com/apache/parquet-format/pull/474), it
> > > would
> > > > > be
> > > > > good to discuss if people think there is value in this.
> > > > >
> > > > >
> > > > >
> > > > > > However, there was a discussion [2] on the requirement of two PoC
> > > > > reference
> > > > > > implementations when promoting a new format change.
> > > > >
> > > > >
> > > > > With regards to Variant implementations, for Java, don't we need
> the
> > > > format
> > > > > released before the implementation can be provided (I thought
> > > > parquet-java
> > > > > consumed a released parquet-format jar in its build)?
> > > > >
> > > > >
> > > > > > However, there was a discussion [2] on the requirement of two PoC
> > > > > reference
> > > > > > implementations when promoting a new format change. There are
> also
> > > > > concerns
> > > > > > from the variant logical type PR [3] against parquet-java. This
> is
> > > > > > something to
> > > > > > discuss in the community if we want to make the variant type an
> > > > > exception.
> > > > >
> > > > >
> > > > > I thought the compromise we came to is that the documentation  for
> > > > Variant
> > > > > states that it is still experimental (maybe we should add this as a
> > > > comment
> > > > > to parquet.thrift as well to make this very clear) . I was under
> the
> > > > > impression that Variant would stay experimental until the 2
> > > > implementations
> > > > > were complete.  I think we should clarify the scope of what we
> think
> > is
> > > > > acceptable for the implementations but that should probably be a
> > > separate
> > > > > thread).  I also have some concerns about some current variant spec
> > > after
> > > > > reviewing initial spec and the proposed shredding simplification
> [1],
> > > > which
> > > > > I'll raise on a separate thread.
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > [1] https://github.com/apache/parquet-format/pull/461
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <ust...@gmail.com> wrote:
> > > > >
> > > > > > Hi Gene,
> > > > > >
> > > > > > Thanks for your effort on adding variant type to the
> > parquet-format!
> > > > For
> > > > > > the next
> > > > > > release, I'd like to include the geometry type [1] as well which
> is
> > > > also
> > > > > > targeted
> > > > > > for the Iceberg V3 spec. I can volunteer to be the release
> manager.
> > > > > >
> > > > > > However, there was a discussion [2] on the requirement of two PoC
> > > > > reference
> > > > > > implementations when promoting a new format change. There are
> also
> > > > > concerns
> > > > > > from the variant logical type PR [3] against parquet-java. This
> is
> > > > > > something to
> > > > > > discuss in the community if we want to make the variant type an
> > > > > exception.
> > > > > >
> > > > > > [1] https://github.com/apache/parquet-format/pull/240
> > > > > > [2]
> > https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm
> > > > > > [3] https://github.com/apache/parquet-java/pull/3072
> > > > > >
> > > > > > Best,
> > > > > > Gang
> > > > > >
> > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <gene.p...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We updated parquet-format <
> > > https://github.com/apache/parquet-format>
> > > > > to
> > > > > > > include the Variant logical type annotation. Would someone be
> > able
> > > to
> > > > > > > release parquet-format (and create the necessary artifacts) so
> > that
> > > > > > > parquet-java can be updated to depend on the new release? This
> > > would
> > > > > > enable
> > > > > > > adding implementation in parquet-java.
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Gene
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to