Notes

   -

   Rok: fintech. Arrow contributor, Parquet cpp contributor. Review on
   encryption contribution
   -

   Julien: Datadog,
   -

   Antoine: formerly Voltron. Interested in Arrow
   -

   Steve Loughran: Cloudera. Performance, Parquet cloud storage.
   -

   Micah: Google, listen in, Variant.
   -

   Alkis: Databricks, improving metadata v3
   -

   Gene: Databricks. Variant.


Variant PR: https://github.com/apache/parquet-format/pull/456

   -

   Action Items:
   -

      Micah to update the variant disclaimer
      -

      Micah to review the Variant PR.
      -

   Time Expectation for making Variant official?:
   -

      Variant Binary Encoding is used in Databricks and fairly Mature
      -

         We probably need a Logical Type
         -

         Need to explain reasoning on spec design and perf measurement
         -

      Shredding spec is new and not implemented yet
      -

   Awkward array:
   -


      https://awkward-array.org/doc/main/user-guide/how-to-convert-arrow.html
      -

      Any collaboration opportunity? Opened up
      https://github.com/scikit-hep/awkward/discussions/3282 to ask for
      feedback

Encryption:

   -

   Feedback requested: https://github.com/apache/arrow/pull/41821
   -

   _metadata file not encrypted (stats, etc)


Metadata “V3”

   -

   Alkis is integrating in the engine at databricks to get performance data.
   -

   Some refinements in progress.


Hugging Face dedup blog post:
https://huggingface.co/blog/improve_parquet_dedupe

   -

   Do we need deletion vectors in parquet?
   -

      Parquet file that contains 1 deletion-vector column and points to the
      previous file?
      -

      Parquet having roaring-bitmap implementation?


Int96 has some usefulness of capturing dates that can not be encoded by
int64

   -

   Should it be removed or replaced?


On Wed, Oct 9, 2024 at 7:53 AM Julien Le Dem <jul...@apache.org> wrote:

> The Parquet Sync is happening today at 9:30am PT - 12:30pm ET - 6:30pm
> CET(in ~90 mins)To join the invite:
> https://calendar.app.google/iTNRWZ6KUVBes1k56
> (email to be added to the recurring invite)
> Everybody is welcome, bring your topic or just listen in.
> Best
> Julien
>

Reply via email to