Jun 11, 2025 | Apache Parquet Community Sync
<https://www.google.com/calendar/event?eid=MmZvYnM1cXRoOWQ2aHVwbWRjcTF1azZpdmFfMjAyNTA1MjhUMTcwMDAwWiBqdWxpZW4ubGVkZW1AbQ>

Attendees: Apache Parquet Community Sync
<apache-parquet-community-s...@googlegroups.com>

   -

   Micah Kornfield: Databricks
   -

   Talat Uyarer: Google
   -

   Martin Prammer: CMU, how to coordinate on the VariantType work
   -

   Aditya Bhatnagar: CMU
   -

   Neil Chao: Snowflake
   -

   Prateek Gaur: Snowflake
   -

   Martin Loncaric: Jane Street, encodings
   -

   Rok Mihevc: G-Research
   -

   Sandeep Gottimukkala:


Agenda:

   -

   Update on Variants in Rust
   -

   Moderator for next meeting


Notes:

   -

   Is there a centralized location to discuss design goals for VariantType
   -

      There is the spec in parquet-format, not really helpful for
      coordination
      -

      Lots of people working on variant (arrow, parquet)
      -

         Databricks, snowflake, CMU
         -

         Shared framework for variant builder apis across frameworks?
         -

      Rust draft
      -

      More focused forum for coordinating variant
      -

      Micah
      -

         Maybe arrow rust has a slack/discord?
         -

         We could consider creating a Parquet discord
         -

         Use more emails / go fishing in email
         -

         Only really central thing is spec for variant type
         -

      No big conflicts happened yet, but a lot of emails/juggling
      -

      Java has variant mostly implement, C++ and Rust are coming along
      quickly
      -
      -

   Encodings
   -

      Parquet new features
      
<https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0>
      - Micah prepared a doc on process for improving parquet
      -

      Datasets - we might want to drop some or add more based on community
      feedback
      -

      Multiple implementations - lots of pros and cons
      -

      Concerns about JVM performance. We should benchmark
      -

      Pluggable encodings
      -

         Wasm - could download from trusted source or embedded in file, but
         would be security risk and/or increase file size
         -

         Custom encoding - somehow Parquet determines library to use at
         runtime. Would need to build extension mechanism into each applicable
         language
         -

      Building out benchmarks
      -

         Martin P has some stuff already built?
         -

         Martin L has a benchmark tool that encompasses Parquet as well
         (though doesn’t measure every facet Parquet would want)
         -

   Extension types (how would these work in the context of DecFloat)

Reply via email to