Notes:

Attendees and topics:

   -

   Micah: Databricks
   -

   Julien: Datadog. Variant.
   -

   Gabor: Dremio. Variant. Logical types.
   -

   Martin: Jane street. Pco compression library
   <https://github.com/pcodec/pcodec>
   -

   Marc: Datadog, Variant
   -

   Alkis: Databricks
   -

   Gene: DB Variant
   -

   Nong: DB,
   -

   Dan: DB, Variant
   -

   Fokko: DB
   -

   Andrew: Influx Data, Variant, rust impl
   -

   Dewey: Geometry type in C++. read unknown logical types without error
   -

   Russel: Snowflake
   -

   Ryan: DB, Variant
   -

   Neil: Snowflake, variant
   -

   Aihua: Snowflake,
   -

   Rok


Topics

   -

   Geometry
   -

      PRs in review
      -

         https://github.com/apache/arrow/pull/45459
         -

         https://github.com/apache/parquet-java/pull/2971
         -

      Extensibility added
      -

      Happy with the reviews, it’s nice.
      -

      Discussion on bounding box calculation:
      -

         In C++ WKB parser is there.
         -

         We don’t write statistics for Geography types today.
         -

   Variant
   -

      We’re agreeing on releasing a version of parquet-format and
      parquet-java with the Variant logical type.
      -

         The impl should not read versions they don’t know
         -

         The release should clearly label Variant as experimental
         -

         This will unblock creating example files for cross implementation
         testing
         -

      Versioning conflicts
      -

         Versioning PR <https://github.com/apache/parquet-format/pull/474>
         -

      Rust implementation
      -

         Once we have the example files for testing we can work on that.
         -

      GO implementation JIRA <https://github.com/apache/arrow-go/issues/310>
      -

         Need feedback on how this will work with Arrow.
         -

   Pco compression library <https://github.com/pcodec/pcodec>
   -

      Pco is good for numeric but not for strings
      -

      What is the selection process to add new codec?
      -

         Industrial/Foundation backer
         -

         Implementations in other languages (Java)
         -

         What is the support for the codec long term?
         -

            Backed by a company of a project in an OSS foundation
            -

      Past example:
      -

         Brotli:
         -

            JNI bindings in java
            -

            Zstandard became the better adopted compression.
            -

      Better discussion: What do we pick for a better numeric encoding?
      -

         It should be data driven on how we pick an encoding.
         -

         There is consensus that we need a better numeric encoding.
         -

         We need a good framework for deciding how we validate the next
         encoding.
         -

         Need a good selection strategy for selecting an encoding at write
         time.



Action items

   -

   Micah: Start the document to collaborate and formalize the process to
   add a new encoding to the format.


On Tue, Mar 18, 2025 at 5:37 PM Julien Le Dem <jul...@apache.org> wrote:

> The next Parquet sync is tomorrow Mar 19th at 10:00am PT - 1:00pm ET -
> 6:00pm CET
> To join the invite:
> https://calendar.app.google/f3jF625ifA8LLjoZA
> Please contact me to be added to the recurring invite. (every two weeks)
> Everybody is welcome, bring your topic or just listen in.
> Best
> Julien
>

Reply via email to