Meeting notes:
Attendees: Rok: contributor to Arrow, encryption, Rust Gabor: Dremio, topic: Variant. Fokko: Databricks Dan: Databricks, topic: Variant Geo types Kenny: hyparquet (js) Gene: Databricks, topic: Variant Andrew: Influx Data, rust parquet maintainer, data fusion. topic: Variant in RUST Ashish: Sumo Logic, listen in Micah: Google Neil: Snowflake, variant C++ Ryan: Databricks, topic: variant, geo Aihua: Snowflake, topic: variant Dewey: topic: PR open Geometry (C++, RUST) Nong: Databricks Agenda/Notes: - Geo types: - Geo implementations: - C++: https://github.com/apache/arrow/pull/45459 - Java: https://github.com/apache/parquet-java/pull/2971 - Update - Geometry - Geography: Stats TBD - Java: - Christian and Fend have been working on the java implementation - Need a release - Fuzz testing - Getting a lot of feedback. Thanks! - Definition of the stats: in thrift with clear language. - Enable bounding box that go over the 0 line. (Fiji) - Don’t want stats that lie. Bad stats, bad data - Variant - Rust impl: https://github.com/apache/arrow-rs/issues/6736 - Need: Unblock variant annotation in the java library - Finalize outstanding discussions - Versioning in Variant annotation => action item - What’s remaining to finalize the spec. - C++ and Java implementations - Java impl in iceberg, moving to Parquet - Impls: - 2 working java implementation - Spark Java implementation <https://github.com/apache/spark/tree/master/common/variant/src/main/java/org/apache/spark/types/variant> (binary, shredding) - Spark Python implementation <https://github.com/apache/spark/blob/master/python/pyspark/sql/variant_utils.py> (binary) - parquet-java implementation PR <https://github.com/apache/parquet-java/pull/3117> (binary) - C++ impl <https://github.com/apache/arrow/pull/45375> - 2 private ones (Snowflake, Databricks(c++, binary, shredding) ) - Lower priority: How to shred? - You cannot add columns after you instantiate the writer. - Could extend writer but collides with encryption - Adding columns for parquet schema in the middle of writing invalidates encryption - Shredding released at the same time as the binary variant. - Dangerous to do shredding as a follow up - Tiny PR for the spec: GH-486: Variant object shredding without field shredding <https://github.com/apache/parquet-format/pull/487> - Compatibility across implementations => Action item - Goal: - Combined Variant and shredding release - Do we require support for shredding? - Variant with shredding is not a separate type. - Did we agree to roll them out together? - We agree that we want to roll out together to reduce potential inconsistencies in implementations. => Action item - Requirements for considering it ready to release: - Need examples data for parquet data. - Versioning of variant spec - https://github.com/apache/parquet-format/pull/474 Action items - [image: unchecked] Julien, Ryan, Micah, Aihua: Follow up on email thread on the parquet-format type annotation for shredding, how we make it easy to work on implementation without fuzzy communication on releases - [image: unchecked] Andrew: follow up on the cross implementation testing - [image: unchecked] Micah, Ryan, Dan: Finalize type annotation versioning discussion on PR 474 - [image: unchecked]Ryan, email about decision to release sharedding with Variant. On Tue, Mar 4, 2025 at 6:13 PM Julien Le Dem <jul...@apache.org> wrote: > The next Parquet sync is tomorrow Mar 5th at 9:30am PT - 12:30pm ET - > 6:30pm CET > To join the invite: > https://calendar.app.google/WTQgodyxSmBUimXT8 > Please contact me to be added to the recurring invite. (every two weeks) > Everybody is welcome, bring your topic or just listen in. > Best > Julien >