Notes: Attendees and topics:
- Micah: Databricks - Julien: Datadog. Variant. - Gabor: Dremio. Variant. Logical types. - Martin: Jane street. Pco compression library <https://github.com/pcodec/pcodec> - Marc: Datadog, Variant - Alkis: Databricks - Gene: DB Variant - Nong: DB, - Dan: DB, Variant - Fokko: DB - Andrew: Influx Data, Variant, rust impl - Dewey: Geometry type in C++. read unknown logical types without error - Russel: Snowflake - Ryan: DB, Variant - Neil: Snowflake, variant - Aihua: Snowflake, - Rok Topics - Geometry - PRs in review - https://github.com/apache/arrow/pull/45459 - https://github.com/apache/parquet-java/pull/2971 - Extensibility added - Happy with the reviews, it’s nice. - Discussion on bounding box calculation: - In C++ WKB parser is there. - We don’t write statistics for Geography types today. - Variant - We’re agreeing on releasing a version of parquet-format and parquet-java with the Variant logical type. - The impl should not read versions they don’t know - The release should clearly label Variant as experimental - This will unblock creating example files for cross implementation testing - Versioning conflicts - Versioning PR <https://github.com/apache/parquet-format/pull/474> - Rust implementation - Once we have the example files for testing we can work on that. - GO implementation JIRA <https://github.com/apache/arrow-go/issues/310> - Need feedback on how this will work with Arrow. - Pco compression library <https://github.com/pcodec/pcodec> - Pco is good for numeric but not for strings - What is the selection process to add new codec? - Industrial/Foundation backer - Implementations in other languages (Java) - What is the support for the codec long term? - Backed by a company of a project in an OSS foundation - Past example: - Brotli: - JNI bindings in java - Zstandard became the better adopted compression. - Better discussion: What do we pick for a better numeric encoding? - It should be data driven on how we pick an encoding. - There is consensus that we need a better numeric encoding. - We need a good framework for deciding how we validate the next encoding. - Need a good selection strategy for selecting an encoding at write time. Action items - Micah: Start the document to collaborate and formalize the process to add a new encoding to the format. On Tue, Mar 18, 2025 at 5:37 PM Julien Le Dem <jul...@apache.org> wrote: > The next Parquet sync is tomorrow Mar 19th at 10:00am PT - 1:00pm ET - > 6:00pm CET > To join the invite: > https://calendar.app.google/f3jF625ifA8LLjoZA > Please contact me to be added to the recurring invite. (every two weeks) > Everybody is welcome, bring your topic or just listen in. > Best > Julien >