Hey everyone, I would love to bubble this back up to the top of our mailboxes.
- For Variant, various implementations are in flight: Java in Parquet-Java <https://github.com/apache/parquet-java/pull/3117> and Iceberg-Java <https://github.com/apache/iceberg/pull/12139>, C++ <https://github.com/apache/arrow/pull/45375> in Arrow, Python <https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py> in Spark, and the Arrow Rust community also expressed interest <https://github.com/apache/arrow-rs/issues/6736>. - For Geometry/Geography, we see a C++ PR <https://github.com/apache/arrow/pull/45459> in Arrow, Java in Parquet <https://github.com/apache/parquet-java/pull/2971>, but the vote has just passed last week. We also see that geo support has been added to Iceberg <https://github.com/apache/iceberg/pull/10981>. Both Variant and Geo have been voted for and merged in the format spec. To maintain momentum I think it would be good to get the thrift definitions and the Java convenience JAR out. Does anyone have any questions or concerns about getting this out? Gang, you mentioned that you would like to volunteer as release manager, are you still available? :) Kind regards, Fokko Op do 5 dec 2024 om 05:33 schreef Gene Pang <gene.p...@gmail.com>: > I see, thanks for the clarifications! > > I will work on porting the Spark Java implementation to parquet-java. > > Spark also has a (partial) python implementation for the Variant binary > format, but it needs a bit more work to complete. > > Thanks, > Gene > > On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <andrewlam...@gmail.com> wrote: > > > We also were discussing trying to implement variant in Rust[1], but it > was > > hard due to a lack of other implementations or example data to test > against > > > > Maybe once there is a draft POC for Java, we could whip up something for > > Rust that did the same > > > > [1]: https://github.com/apache/arrow-rs/issues/6736 > > > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <ust...@gmail.com> wrote: > > > > > > With regards to Variant implementations, for Java, don't we need the > > > format > > > > released before the implementation can be provided (I thought > > > parquet-java > > > > consumed a released parquet-format jar in its build)? > > > > > > For parquet-java, usually the PoC PR is based on a locally built > > > parquet-format > > > with an unreleased version when the spec change is under review. Once > the > > > vote > > > has been passed and a new parquet-format is released, the PoC PR gets > > > rebased > > > on the released format for a final review. Below are some examples: > > > > > > float16: https://github.com/apache/parquet-java/pull/1142 > > > size stats: https://github.com/apache/parquet-java/pull/1177 > > > geometry: https://github.com/apache/parquet-java/pull/2971 > > > > > > Best, > > > Gang > > > > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield <emkornfi...@gmail.com> > > > wrote: > > > > > > > Hi Gene, > > > > > > > > Before release, I added a proposal to have a shredding version added > to > > > the > > > > annotation (https://github.com/apache/parquet-format/pull/474), it > > would > > > > be > > > > good to discuss if people think there is value in this. > > > > > > > > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > > reference > > > > > implementations when promoting a new format change. > > > > > > > > > > > > With regards to Variant implementations, for Java, don't we need the > > > format > > > > released before the implementation can be provided (I thought > > > parquet-java > > > > consumed a released parquet-format jar in its build)? > > > > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > > reference > > > > > implementations when promoting a new format change. There are also > > > > concerns > > > > > from the variant logical type PR [3] against parquet-java. This is > > > > > something to > > > > > discuss in the community if we want to make the variant type an > > > > exception. > > > > > > > > > > > > I thought the compromise we came to is that the documentation for > > > Variant > > > > states that it is still experimental (maybe we should add this as a > > > comment > > > > to parquet.thrift as well to make this very clear) . I was under the > > > > impression that Variant would stay experimental until the 2 > > > implementations > > > > were complete. I think we should clarify the scope of what we think > is > > > > acceptable for the implementations but that should probably be a > > separate > > > > thread). I also have some concerns about some current variant spec > > after > > > > reviewing initial spec and the proposed shredding simplification [1], > > > which > > > > I'll raise on a separate thread. > > > > > > > > Thanks, > > > > Micah > > > > > > > > [1] https://github.com/apache/parquet-format/pull/461 > > > > > > > > > > > > > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <ust...@gmail.com> wrote: > > > > > > > > > Hi Gene, > > > > > > > > > > Thanks for your effort on adding variant type to the > parquet-format! > > > For > > > > > the next > > > > > release, I'd like to include the geometry type [1] as well which is > > > also > > > > > targeted > > > > > for the Iceberg V3 spec. I can volunteer to be the release manager. > > > > > > > > > > However, there was a discussion [2] on the requirement of two PoC > > > > reference > > > > > implementations when promoting a new format change. There are also > > > > concerns > > > > > from the variant logical type PR [3] against parquet-java. This is > > > > > something to > > > > > discuss in the community if we want to make the variant type an > > > > exception. > > > > > > > > > > [1] https://github.com/apache/parquet-format/pull/240 > > > > > [2] > https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm > > > > > [3] https://github.com/apache/parquet-java/pull/3072 > > > > > > > > > > Best, > > > > > Gang > > > > > > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang <gene.p...@gmail.com> > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > We updated parquet-format < > > https://github.com/apache/parquet-format> > > > > to > > > > > > include the Variant logical type annotation. Would someone be > able > > to > > > > > > release parquet-format (and create the necessary artifacts) so > that > > > > > > parquet-java can be updated to depend on the new release? This > > would > > > > > enable > > > > > > adding implementation in parquet-java. > > > > > > > > > > > > Thanks! > > > > > > Gene > > > > > > > > > > > > > > > > > > > > >