Notes -
Julien Le Dem <[email protected]> Datadog: follow up on encodings and flatbuff footer, meeting timing - Rok Mihevc - Arctos Alliance, listening in, curious about flat buffer metadata progress - Andrew Lamb (InfluxData) - Variant and Geospatial Blogs (would like to hear about ALP progress if any) - Kenny Daniel - Hyperparam - Ben Owad - Snowflake - listening - Jiayi Wang - Databricks - listening - Aihua Xu - Snowflake - Variant Blog and listening The Evolution of Semi-Structured Data: Introducing Variant in Apache Parquet <https://docs.google.com/document/d/1ABr3p-xj_8rHQ2kdzzDSceejGkU0nWnriZGPjhDoDBc/edit?tab=t.0> - Anurag Mantripragada - Apple - Wanted to share Efficient Column Updates in Iceberg <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0> and listening in (will have to drop in 30 minutes) - Arnav Balyan - Uber - FSST Spec Parquet FSST Support: Specification <https://docs.google.com/document/d/1Xg2b8HR19QnI3nhtQUDWZJhCLwJzW6y9tU1ziiLFZrM/edit?tab=t.0#heading=h.a9r0tnd6fhtq> - Jiaying Li - CMU - listening - Russell Spitzer - Snowflake - Column Updates - listening Notes: - Announcements: - Welcome Andrew to the PMC! - Sharing Efficient Column Updates in Iceberg <https://docs.google.com/document/d/1Bd7JVzgajA8-DozzeEE24mID_GLuz6iwj0g4TlcVJcs/edit?tab=t.0> - Column update is a single parquet file with a subset of columns - Stitching columns on read. - Should this be in Iceberg or Parquet? Document describes both options with pros and cons. - Russel: Iceberg - Sync: Tuesday 9am PT. on the iceberg dev. - https://iceberg.apache.org/community/#apache-iceberg-community-calendar - Parquet: - Blogs! - Variant: The Evolution of Semi-Structured Data: Introducing Variant in Apache Parquet <https://docs.google.com/document/d/1ABr3p-xj_8rHQ2kdzzDSceejGkU0nWnriZGPjhDoDBc/edit?tab=t.0> - Geospatial types: Parquet Geo data type blog post <https://docs.google.com/document/d/1JPK0F6Vn4sjXGO4AzrkywOjlj_ybDV_6v0zuaFIIjlk/edit?tab=t.0#heading=h.f5ymbunigmpp> - Ask: please read and provide feedback if you are interested - Once docs have settled down we will turn them into markdown and post to https://parquet.apache.org/ - Meeting timing - Will shift the meeting by a week. Next in 3 weeks. - Updates: - Encodings - FSST: Arnav Parquet FSST Support: Specification <https://docs.google.com/document/d/1Xg2b8HR19QnI3nhtQUDWZJhCLwJzW6y9tU1ziiLFZrM/edit?tab=t.0#heading=h.a9r0tnd6fhtq> - Great comments on the proposal, spec released. - Questions: - Is the table in each page or in the dictionary page? - Preferred => dictionary page to start with - Spec will need review: - Everyone please review!! - In particular: Julien Le Dem <[email protected]> [email protected], [email protected] - ALP:[Parquet] ALP Spec.docx <https://docs.google.com/document/d/1xz2cudDpN2Y1ImFcTXh15s-3fPtD_aWt/edit> - All feedback has been addressed - Except: - Exceptions encoding is still being discussed. - TODO: need help finalizing that decision - Andrew: I agree that optimizing for a large number of exceptions is necessary -- as ALP is not going to be a good choice in that case where there are a large number of exceptions - Version field in alp header or not? - Goal to - customize integer encoding - Alp-rd - Question of using new enum instead? - TODO: help finalizing that decision point in the doc - Kenny: Note on ALP is that hyparquet has a branch with experimental support, but would really benefit from some example parquet-testing files. - Need example files for other implementations. - Is it easy to generate files with the cpp implementation - TODO: utility to generate file in CPP. - flatbuff footer: - Jiayi: comments on the spec have been addressed and tested internally - TODO: - sync in the OSS PR. - Encryption is added to the spec but no implementation so far - Need review: Rok volunteering. - Send final reminder to mailing list On Tue, Feb 3, 2026 at 4:49 PM Julien Le Dem <[email protected]> wrote: > The next Parquet sync is tomorrow Wednesday Feb 4th at 10am PT - 1pm ET - > 7pm CET > > To join the invite, join the group: > https://groups.google.com/g/apache-parquet-community-sync > > Everybody is welcome, bring your topic or just listen in. > > (Some more details on how the meeting is run: > https://lists.apache.org/thread/bjdkscmx7zvgfbw0wlfttxy8h6v3f71t ) >
