The Variant in the thrift definition is a struct, so we can easily add version later. The only reason to add it now is if we want to be able to break forward compatibility with shredding. I'd be fine adding an encoding/shredding version = 1.
On Thu, Feb 20, 2025 at 4:49 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > >Does anyone have any questions or concerns about getting this out? Gang, > you mentioned that you would like to volunteer as release manager, are > you still available? :) > > I think we should close on if we want versioning (and what sort) we want > for variant [1] in the thrift header. > > I'd also prefer to wait on releasing a variant until there is an official > ratification (it seems like it should be close) in parquet? It seems like > people might get confused on status if they aren't reading the docs > carefully on current support levels? > > Thanks, > Micah > > [1] https://github.com/apache/parquet-format/pull/474 > > On Wed, Feb 19, 2025 at 10:51 PM Gang Wu <ust...@gmail.com> wrote: > > > If there is no objection, I will prepare the release candidate of > > parquet-format 2.11.0 and send out the vote early next week. > > > > On Mon, Feb 17, 2025 at 8:47 PM Gang Wu <ust...@gmail.com> wrote: > > > > > Thanks Fokko for bringing this up! Yes, I can be the release manager > > > if the community reaches a consensus. > > > > > > Best, > > > Gang > > > > > > On Mon, Feb 17, 2025 at 6:58 PM Fokko Driesprong <fo...@apache.org> > > wrote: > > > > > >> Hey everyone, > > >> > > >> I would love to bubble this back up to the top of our mailboxes. > > >> > > >> - For Variant, various implementations are in flight: Java in > > >> Parquet-Java <https://github.com/apache/parquet-java/pull/3117> > and > > >> Iceberg-Java <https://github.com/apache/iceberg/pull/12139>, C++ > > >> <https://github.com/apache/arrow/pull/45375> in Arrow, Python > > >> < > > >> > > > https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py > > >> > > > >> in Spark, and the Arrow Rust community also expressed interest > > >> <https://github.com/apache/arrow-rs/issues/6736>. > > >> - For Geometry/Geography, we see a C++ PR > > >> <https://github.com/apache/arrow/pull/45459> in Arrow, Java in > > Parquet > > >> <https://github.com/apache/parquet-java/pull/2971>, but the vote > has > > >> just passed last week. We also see that geo support has been added > to > > >> Iceberg <https://github.com/apache/iceberg/pull/10981>. > > >> > > >> Both Variant and Geo have been voted for and merged in the format > spec. > > To > > >> maintain momentum I think it would be good to get the thrift > definitions > > >> and the Java convenience JAR out. > > >> > > >> Does anyone have any questions or concerns about getting this out? > Gang, > > >> you mentioned that you would like to volunteer as release manager, are > > >> you still available? :) > > >> > > >> Kind regards, > > >> Fokko > > >> > > >> > > >> Op do 5 dec 2024 om 05:33 schreef Gene Pang <gene.p...@gmail.com>: > > >> > > >> > I see, thanks for the clarifications! > > >> > > > >> > I will work on porting the Spark Java implementation to > parquet-java. > > >> > > > >> > Spark also has a (partial) python implementation for the Variant > > binary > > >> > format, but it needs a bit more work to complete. > > >> > > > >> > Thanks, > > >> > Gene > > >> > > > >> > On Wed, Dec 4, 2024 at 6:11 AM Andrew Lamb <andrewlam...@gmail.com> > > >> wrote: > > >> > > > >> > > We also were discussing trying to implement variant in Rust[1], > but > > it > > >> > was > > >> > > hard due to a lack of other implementations or example data to > test > > >> > against > > >> > > > > >> > > Maybe once there is a draft POC for Java, we could whip up > something > > >> for > > >> > > Rust that did the same > > >> > > > > >> > > [1]: https://github.com/apache/arrow-rs/issues/6736 > > >> > > > > >> > > On Wed, Dec 4, 2024 at 4:57 AM Gang Wu <ust...@gmail.com> wrote: > > >> > > > > >> > > > > With regards to Variant implementations, for Java, don't we > need > > >> the > > >> > > > format > > >> > > > > released before the implementation can be provided (I thought > > >> > > > parquet-java > > >> > > > > consumed a released parquet-format jar in its build)? > > >> > > > > > >> > > > For parquet-java, usually the PoC PR is based on a locally built > > >> > > > parquet-format > > >> > > > with an unreleased version when the spec change is under review. > > >> Once > > >> > the > > >> > > > vote > > >> > > > has been passed and a new parquet-format is released, the PoC PR > > >> gets > > >> > > > rebased > > >> > > > on the released format for a final review. Below are some > > examples: > > >> > > > > > >> > > > float16: https://github.com/apache/parquet-java/pull/1142 > > >> > > > size stats: https://github.com/apache/parquet-java/pull/1177 > > >> > > > geometry: https://github.com/apache/parquet-java/pull/2971 > > >> > > > > > >> > > > Best, > > >> > > > Gang > > >> > > > > > >> > > > On Wed, Dec 4, 2024 at 2:57 PM Micah Kornfield < > > >> emkornfi...@gmail.com> > > >> > > > wrote: > > >> > > > > > >> > > > > Hi Gene, > > >> > > > > > > >> > > > > Before release, I added a proposal to have a shredding version > > >> added > > >> > to > > >> > > > the > > >> > > > > annotation (https://github.com/apache/parquet-format/pull/474 > ), > > >> it > > >> > > would > > >> > > > > be > > >> > > > > good to discuss if people think there is value in this. > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > However, there was a discussion [2] on the requirement of > two > > >> PoC > > >> > > > > reference > > >> > > > > > implementations when promoting a new format change. > > >> > > > > > > >> > > > > > > >> > > > > With regards to Variant implementations, for Java, don't we > need > > >> the > > >> > > > format > > >> > > > > released before the implementation can be provided (I thought > > >> > > > parquet-java > > >> > > > > consumed a released parquet-format jar in its build)? > > >> > > > > > > >> > > > > > > >> > > > > > However, there was a discussion [2] on the requirement of > two > > >> PoC > > >> > > > > reference > > >> > > > > > implementations when promoting a new format change. There > are > > >> also > > >> > > > > concerns > > >> > > > > > from the variant logical type PR [3] against parquet-java. > > This > > >> is > > >> > > > > > something to > > >> > > > > > discuss in the community if we want to make the variant type > > an > > >> > > > > exception. > > >> > > > > > > >> > > > > > > >> > > > > I thought the compromise we came to is that the documentation > > for > > >> > > > Variant > > >> > > > > states that it is still experimental (maybe we should add this > > as > > >> a > > >> > > > comment > > >> > > > > to parquet.thrift as well to make this very clear) . I was > under > > >> the > > >> > > > > impression that Variant would stay experimental until the 2 > > >> > > > implementations > > >> > > > > were complete. I think we should clarify the scope of what we > > >> think > > >> > is > > >> > > > > acceptable for the implementations but that should probably > be a > > >> > > separate > > >> > > > > thread). I also have some concerns about some current variant > > >> spec > > >> > > after > > >> > > > > reviewing initial spec and the proposed shredding > simplification > > >> [1], > > >> > > > which > > >> > > > > I'll raise on a separate thread. > > >> > > > > > > >> > > > > Thanks, > > >> > > > > Micah > > >> > > > > > > >> > > > > [1] https://github.com/apache/parquet-format/pull/461 > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > On Tue, Dec 3, 2024 at 10:28 PM Gang Wu <ust...@gmail.com> > > wrote: > > >> > > > > > > >> > > > > > Hi Gene, > > >> > > > > > > > >> > > > > > Thanks for your effort on adding variant type to the > > >> > parquet-format! > > >> > > > For > > >> > > > > > the next > > >> > > > > > release, I'd like to include the geometry type [1] as well > > >> which is > > >> > > > also > > >> > > > > > targeted > > >> > > > > > for the Iceberg V3 spec. I can volunteer to be the release > > >> manager. > > >> > > > > > > > >> > > > > > However, there was a discussion [2] on the requirement of > two > > >> PoC > > >> > > > > reference > > >> > > > > > implementations when promoting a new format change. There > are > > >> also > > >> > > > > concerns > > >> > > > > > from the variant logical type PR [3] against parquet-java. > > This > > >> is > > >> > > > > > something to > > >> > > > > > discuss in the community if we want to make the variant type > > an > > >> > > > > exception. > > >> > > > > > > > >> > > > > > [1] https://github.com/apache/parquet-format/pull/240 > > >> > > > > > [2] > > >> > https://lists.apache.org/thread/f9379yx0lf5gtpkgyv922pvowtzy4kmm > > >> > > > > > [3] https://github.com/apache/parquet-java/pull/3072 > > >> > > > > > > > >> > > > > > Best, > > >> > > > > > Gang > > >> > > > > > > > >> > > > > > On Wed, Dec 4, 2024 at 2:08 PM Gene Pang < > gene.p...@gmail.com > > > > > >> > > wrote: > > >> > > > > > > > >> > > > > > > Hi, > > >> > > > > > > > > >> > > > > > > We updated parquet-format < > > >> > > https://github.com/apache/parquet-format> > > >> > > > > to > > >> > > > > > > include the Variant logical type annotation. Would someone > > be > > >> > able > > >> > > to > > >> > > > > > > release parquet-format (and create the necessary > artifacts) > > so > > >> > that > > >> > > > > > > parquet-java can be updated to depend on the new release? > > This > > >> > > would > > >> > > > > > enable > > >> > > > > > > adding implementation in parquet-java. > > >> > > > > > > > > >> > > > > > > Thanks! > > >> > > > > > > Gene > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > >