Notes:

Attendees and topics:

   -

   Julien, Datadog. Variant test.
   -

   Adam GResearch
   -

   Aditya CMU Variant
   -

   Ashish, sumo logic
   -

   Aihua Snowflake, Variant
   -

   Andrew, Influx Data
   -

   Claire, Spotify, Vector read support parquet-java.
   https://github.com/apache/parquet-java/pull/1139
   -

   Dewey, whereabouts
   -

   Fokko, Databricks
   -

   Jan, Salesforce, PR on Doubles, ieee total order
   -

   Marc, Datadog, Variant
   -

   Naohiro, Mitsubishi Electric : encoding, FSST, ALP
   -

   Raul quantstack arrow, parquet-cpp, pyarrow: new footer
   -

   Rok, GResearch, pyarrow, arrow-cpp, rust. Flatbuffer metadata
   -

   Russel, Snowflake


Agenda:

   -

   New footer:
   -

      TODO: follow up with alkis.evlogime...@databricks.com on the mailing
      list.
      -

      Related: as a way to read the footer faster.
      -

         Rust reader, trying to optimize the thrift decoder
         -

   Vector read support <https://github.com/apache/parquet-java/pull/1139>
   parquet-java
   -

      GCS: read vector in
      -

      M/R dataflow -> reading many parquet files on the same worker.
      -

      Would help reduce memory pressure.
      -

   encodings: FSST, ALP
   -

      Contributors needed.
      -

      TODO: Finalize Parquet new features
      
<https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0>
      -

   Variant
   -

      Test Data with Parquet Logical Types
      <https://github.com/apache/parquet-testing/pull/91/files>
      -

      Parquet-Java Variant testing
      <https://github.com/apache/parquet-java/pull/3258> against the test
      data
      -

      Parquet-GO Variant testing
      <https://github.com/apache/arrow-go/pull/455> against the test data
      -

      Can we vote to finalize the spec?
      -

      Need to release parquet-java, update Iceberg.
      -

         Parquet-go: how to handle some invalid cases? Reader can fail or
         have undefined behavior.
         -

            Shredded vs unshredded has the same data.
            -

         We hit some questions when implementing a go integration test
         https://github.com/apache/arrow-go/pull/455
         -

         https://github.com/apache/parquet-testing/pull/90
         -

   PR on Doubles, IEEE 754 total order
   <https://github.com/apache/parquet-format/pull/221>
   -

      To be discussed at the next meeting.


Action items

   -

   Foffo to follow up with Alkis on the status of the flatbuffer based
   footer.
   - Julien, Andrew and Russel to follow up on Micah’s proposal for Parquet
   new features
   
<https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0>
   .


On Tue, Aug 5, 2025 at 8:06 PM Julien Le Dem <jul...@apache.org> wrote:

> The next Parquet sync is tomorrow Aug 6th at 10am PT - 1pm ET - 7pm CET
> I'll facilitate unless someone else wants to do it (feel free to reply to
> this email)
>
> To join the invite, join the group:
> https://groups.google.com/g/apache-parquet-community-sync
>
> Everybody is welcome, bring your topic or just listen in.
>
> (Some more details on how the meeting is run:
> https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t )
>

Reply via email to