Notes -
Rok: fintech. Arrow contributor, Parquet cpp contributor. Review on encryption contribution - Julien: Datadog, - Antoine: formerly Voltron. Interested in Arrow - Steve Loughran: Cloudera. Performance, Parquet cloud storage. - Micah: Google, listen in, Variant. - Alkis: Databricks, improving metadata v3 - Gene: Databricks. Variant. Variant PR: https://github.com/apache/parquet-format/pull/456 - Action Items: - Micah to update the variant disclaimer - Micah to review the Variant PR. - Time Expectation for making Variant official?: - Variant Binary Encoding is used in Databricks and fairly Mature - We probably need a Logical Type - Need to explain reasoning on spec design and perf measurement - Shredding spec is new and not implemented yet - Awkward array: - https://awkward-array.org/doc/main/user-guide/how-to-convert-arrow.html - Any collaboration opportunity? Opened up https://github.com/scikit-hep/awkward/discussions/3282 to ask for feedback Encryption: - Feedback requested: https://github.com/apache/arrow/pull/41821 - _metadata file not encrypted (stats, etc) Metadata “V3” - Alkis is integrating in the engine at databricks to get performance data. - Some refinements in progress. Hugging Face dedup blog post: https://huggingface.co/blog/improve_parquet_dedupe - Do we need deletion vectors in parquet? - Parquet file that contains 1 deletion-vector column and points to the previous file? - Parquet having roaring-bitmap implementation? Int96 has some usefulness of capturing dates that can not be encoded by int64 - Should it be removed or replaced? On Wed, Oct 9, 2024 at 7:53 AM Julien Le Dem <jul...@apache.org> wrote: > The Parquet Sync is happening today at 9:30am PT - 12:30pm ET - 6:30pm > CET(in ~90 mins)To join the invite: > https://calendar.app.google/iTNRWZ6KUVBes1k56 > (email to be added to the recurring invite) > Everybody is welcome, bring your topic or just listen in. > Best > Julien >