Notes: Attendees and topics:
- Julien, Datadog. Variant test. - Adam GResearch - Aditya CMU Variant - Ashish, sumo logic - Aihua Snowflake, Variant - Andrew, Influx Data - Claire, Spotify, Vector read support parquet-java. https://github.com/apache/parquet-java/pull/1139 - Dewey, whereabouts - Fokko, Databricks - Jan, Salesforce, PR on Doubles, ieee total order - Marc, Datadog, Variant - Naohiro, Mitsubishi Electric : encoding, FSST, ALP - Raul quantstack arrow, parquet-cpp, pyarrow: new footer - Rok, GResearch, pyarrow, arrow-cpp, rust. Flatbuffer metadata - Russel, Snowflake Agenda: - New footer: - TODO: follow up with alkis.evlogime...@databricks.com on the mailing list. - Related: as a way to read the footer faster. - Rust reader, trying to optimize the thrift decoder - Vector read support <https://github.com/apache/parquet-java/pull/1139> parquet-java - GCS: read vector in - M/R dataflow -> reading many parquet files on the same worker. - Would help reduce memory pressure. - encodings: FSST, ALP - Contributors needed. - TODO: Finalize Parquet new features <https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0> - Variant - Test Data with Parquet Logical Types <https://github.com/apache/parquet-testing/pull/91/files> - Parquet-Java Variant testing <https://github.com/apache/parquet-java/pull/3258> against the test data - Parquet-GO Variant testing <https://github.com/apache/arrow-go/pull/455> against the test data - Can we vote to finalize the spec? - Need to release parquet-java, update Iceberg. - Parquet-go: how to handle some invalid cases? Reader can fail or have undefined behavior. - Shredded vs unshredded has the same data. - We hit some questions when implementing a go integration test https://github.com/apache/arrow-go/pull/455 - https://github.com/apache/parquet-testing/pull/90 - PR on Doubles, IEEE 754 total order <https://github.com/apache/parquet-format/pull/221> - To be discussed at the next meeting. Action items - Foffo to follow up with Alkis on the status of the flatbuffer based footer. - Julien, Andrew and Russel to follow up on Micah’s proposal for Parquet new features <https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0> . On Tue, Aug 5, 2025 at 8:06 PM Julien Le Dem <jul...@apache.org> wrote: > The next Parquet sync is tomorrow Aug 6th at 10am PT - 1pm ET - 7pm CET > I'll facilitate unless someone else wants to do it (feel free to reply to > this email) > > To join the invite, join the group: > https://groups.google.com/g/apache-parquet-community-sync > > Everybody is welcome, bring your topic or just listen in. > > (Some more details on how the meeting is run: > https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t ) >