Attendees and topics:

   -

   Micah Databricks: follow up on guidelines for new encodings.
   -

   Andrew Lamb: InfluxData, Rust Parquet maintainer, discuss Variant binary
   -

   Kenny: HyParquet, JS.
   -

   Dewey Whereabout. C++ geometry, feng Java impl. Update on Geo
   -

   Adam: G-Research OSS, ParquetSharp. Work with Rok, Key mgmt tool api to
   arrow rust
   -

   Talat Google BigQuery
   -

   Raul quantstack, arrow Parquet cpp
   -

   Rok: KMS question for encryption, Variant
   -

   Fokko DB
   -

   Gabor dremio: Variant, Parquet-java/avro module CVE
   -

   Dan: Databricks, Variant, geotypes. (iceberg dep)
   -

   Martin CMU, Variant in Rust
   -

   Jiaying: CMU. Rust Arrow, Parquet
   -

   Aihua: Snowflake, Variant
   -

   Gene Databricks, Variant
   -

   Steve Loughran

Notes

   -

   parquet-java/avro
   -

      We need a proper way to limit risk with reflection
      -

      Need avro expertise
      -

      Option to remove functionality or forcing opt in a a system property.
      -

      dev-list:
      https://lists.apache.org/thread/c91s61tqkbbrc7xj180xh2rx89yx8pfk
      -

      Avro GitHub issues:
      -

         https://github.com/apache/parquet-java/issues/3194
         -

         https://github.com/apache/parquet-java/issues/3195
         -

   Encryption KMS
   -

      KMS metadata format not officially standardised but used in
      parquet-java and C++/PyArrow
      -

      Current spec:
      
https://parquet.apache.org/docs/file-format/data-pages/encryption/#43-key-metadata
      -

      PR to add KMS API to arrow-rs
      https://github.com/apache/arrow-rs/pull/7387
      -

      Possibly maintained externally to start with rather than in arrow-rs.
      -

   Update on geometry types
   -

      2 PRs being reviewed
      -

      All inconsistencies between java and C++ resolved now
      -

         Ex: Null
         -

         Canonical way to represent totally empty things
         -

      Need Arrow in and out of that.
      -

   Variant
   -

      Rust: https://github.com/apache/arrow-rs/pull/7404
      -

         See epic https://github.com/apache/arrow-rs/issues/6736
         -

         Discussion on how to fail early when we have an unknown version of
         the Variant spec.
         -

      Testing Binary compatibility [Andrew]
      -

         Made a PR with example Binary variants:
         https://github.com/apache/parquet-testing/issues/75
         -

         Existing implementations:
         -

            Spark can read/write variant
            -

            Iceberg implementation to read Variant Binary (java)
            -

      GO: https://github.com/apache/arrow-go/pull/344/files
      -

      Logical type: https://github.com/apache/parquet-java/pull/3072


Action items

   - [image: unchecked]

   [Gabor and Steve] to follow up on the list on restricting more Avro
   deserialization.
   -

   Follow up on adding files generated by different implementations of the
   Variant spec.


On Tue, Apr 15, 2025 at 4:07 PM Julien Le Dem <jul...@apache.org> wrote:

> The next Parquet sync is tomorrow Apr 16th at 10am PT - 1pm ET - 7pm CET
> To join the invite:
> https://calendar.app.google/rCLANWLz1xg69mTL7
> Please contact me to be added to the recurring invite. (every two weeks)
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>
> Best
> Julien
>

Reply via email to