Thanks Fokko for bringing this up! Yes, I can be the release manager if the community reaches a consensus.
Best, Gang On Mon, Feb 17, 2025 at 6:58 PM Fokko Driesprong <fo...@apache.org> wrote: > Hey everyone, > > I would love to bubble this back up to the top of our mailboxes. > > - For Variant, various implementations are in flight: Java in > Parquet-Java <https://github.com/apache/parquet-java/pull/3117> and > Iceberg-Java <https://github.com/apache/iceberg/pull/12139>, C++ > <https://github.com/apache/arrow/pull/45375> in Arrow, Python > < > https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py > > > in Spark, and the Arrow Rust community also expressed interest > <https://github.com/apache/arrow-rs/issues/6736>. > - For Geometry/Geography, we see a C++ PR > <https://github.com/apache/arrow/pull/45459> in Arrow, Java in Parquet > <https://github.com/apache/parquet-java/pull/2971>, but the vote has > just passed last week. We also see that geo support has been added to > Iceberg <https://github.com/apache/iceberg/pull/10981>. > > Both Variant and Geo have been voted for and merged in the format spec. To > maintain momentum I think it would be good to get the thrift definitions > and the Java convenience JAR out. > > Does anyone have any questions or concerns about getting this out? Gang, > you mentioned that you would like to volunteer as release manager, are > you still available? :) > > Kind regards, > Fokko > > > Op do 5 dec 2024 om 05:33 schreef Gene Pang <gene.p...@gmail.com>: > > > I see, thanks for the clarifications! > > > > I will work on porting the Spark Java implementation to parquet-java. > > > > Spark also has a (partial) python implementation for the Variant binary > > format, but it needs a bit more work to complete. > > > > Thanks, > > Gene > > > > On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <andrewlam...@gmail.com> > wrote: > > > > > We also were discussing trying to implement variant in Rust[1], but it > > was > > > hard due to a lack of other implementations or example data to test > > against > > > > > > Maybe once there is a draft POC for Java, we could whip up something > for > > > Rust that did the same > > > > > > [1]: https://github.com/apache/arrow-rs/issues/6736 > > > > > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <ust...@gmail.com> wrote: > > > > > > > > With regards to Variant implementations, for Java, don't we need > the > > > > format > > > > > released before the implementation can be provided (I thought > > > > parquet-java > > > > > consumed a released parquet-format jar in its build)? > > > > > > > > For parquet-java, usually the PoC PR is based on a locally built > > > > parquet-format > > > > with an unreleased version when the spec change is under review. Once > > the > > > > vote > > > > has been passed and a new parquet-format is released, the PoC PR gets > > > > rebased > > > > on the released format for a final review. Below are some examples: > > > > > > > > float16: https://github.com/apache/parquet-java/pull/1142 > > > > size stats: https://github.com/apache/parquet-java/pull/1177 > > > > geometry: https://github.com/apache/parquet-java/pull/2971 > > > > > > > > Best, > > > > Gang > > > > > > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield < > emkornfi...@gmail.com> > > > > wrote: > > > > > > > > > Hi Gene, > > > > > > > > > > Before release, I added a proposal to have a shredding version > added > > to > > > > the > > > > > annotation (https://github.com/apache/parquet-format/pull/474), it > > > would > > > > > be > > > > > good to discuss if people think there is value in this. > > > > > > > > > > > > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > > > reference > > > > > > implementations when promoting a new format change. > > > > > > > > > > > > > > > With regards to Variant implementations, for Java, don't we need > the > > > > format > > > > > released before the implementation can be provided (I thought > > > > parquet-java > > > > > consumed a released parquet-format jar in its build)? > > > > > > > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > > > reference > > > > > > implementations when promoting a new format change. There are > also > > > > > concerns > > > > > > from the variant logical type PR [3] against parquet-java. This > is > > > > > > something to > > > > > > discuss in the community if we want to make the variant type an > > > > > exception. > > > > > > > > > > > > > > > I thought the compromise we came to is that the documentation for > > > > Variant > > > > > states that it is still experimental (maybe we should add this as a > > > > comment > > > > > to parquet.thrift as well to make this very clear) . I was under > the > > > > > impression that Variant would stay experimental until the 2 > > > > implementations > > > > > were complete. I think we should clarify the scope of what we > think > > is > > > > > acceptable for the implementations but that should probably be a > > > separate > > > > > thread). I also have some concerns about some current variant spec > > > after > > > > > reviewing initial spec and the proposed shredding simplification > [1], > > > > which > > > > > I'll raise on a separate thread. > > > > > > > > > > Thanks, > > > > > Micah > > > > > > > > > > [1] https://github.com/apache/parquet-format/pull/461 > > > > > > > > > > > > > > > > > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <ust...@gmail.com> wrote: > > > > > > > > > > > Hi Gene, > > > > > > > > > > > > Thanks for your effort on adding variant type to the > > parquet-format! > > > > For > > > > > > the next > > > > > > release, I'd like to include the geometry type [1] as well which > is > > > > also > > > > > > targeted > > > > > > for the Iceberg V3 spec. I can volunteer to be the release > manager. > > > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > > > reference > > > > > > implementations when promoting a new format change. There are > also > > > > > concerns > > > > > > from the variant logical type PR [3] against parquet-java. This > is > > > > > > something to > > > > > > discuss in the community if we want to make the variant type an > > > > > exception. > > > > > > > > > > > > [1] https://github.com/apache/parquet-format/pull/240 > > > > > > [2] > > https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm > > > > > > [3] https://github.com/apache/parquet-java/pull/3072 > > > > > > > > > > > > Best, > > > > > > Gang > > > > > > > > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <gene.p...@gmail.com> > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > We updated parquet-format < > > > https://github.com/apache/parquet-format> > > > > > to > > > > > > > include the Variant logical type annotation. Would someone be > > able > > > to > > > > > > > release parquet-format (and create the necessary artifacts) so > > that > > > > > > > parquet-java can be updated to depend on the new release? This > > > would > > > > > > enable > > > > > > > adding implementation in parquet-java. > > > > > > > > > > > > > > Thanks! > > > > > > > Gene > > > > > > > > > > > > > > > > > > > > > > > > > > > >