Thank you Julien. When can we expect a new arrow package release so that I
can compile a doc for customers to donate footers to us?

binary in question:
https://github.com/apache/arrow/blob/main/cpp/tools/parquet/parquet_dump_footer.cc

On Sat, Aug 3, 2024 at 3:17 AM Julien Le Dem <jul...@apache.org> wrote:

> Following up on my action item, I have created the parquet-benchmark repo:
> https://github.com/apache/parquet-benchmark
>
> On Wed, Jul 31, 2024 at 3:46 PM Julien Le Dem <jul...@apache.org> wrote:
>
> > Attendees:
> >
> >    -
> >
> >    Micah: Google, no special topic today
> >    -
> >
> >    Alkis: Databricks, storage stack. Topic: Parquet extension PR so that
> >    we can go in the format. Want to fix the metadata to make it work for
> wide
> >    schemas.
> >    -
> >
> >    Vinoo: Palantir -> startup in data space. Working on improving the
> >    website.
> >    -
> >
> >    Julien: Datadog. Topic: Make parquet reading possible to be done
> >    sequentially (as opposed to footer first)
> >    -
> >
> >    Rok: Voltron -> freelance in Fintech. Care about Parquet performance.
> >    Have time to contribute to footers (“V3”).
> >
> >
> > Follow up items:
> >
> > Mika’s Parquet format changes process
> >
> >    -
> >
> >    First PR merged, need to finalize java
> >    -
> >
> >    => Mostly done
> >
> > Jira -> github migration
> >
> >    -
> >
> >    Getting started with github. Will follow up on the mailing list.
> >    -
> >
> >    => mostly closed discussion. Some follow up async on the discussion.
> >
> >
> > Agenda:
> >
> >    -
> >
> >    Finalizing [EXTERNAL] Parquet extensions
> >    <
> https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit#heading=h.15ohoov5qqm6
> >
> >
> >    -
> >
> >       AI: Alkis Evlogimenos <alkis.evlogime...@databricks.com> to update
> >       PR with everything in the doc except Alternatives Considered and
> split the
> >       examples in another page.
> >       -
> >
> >    New footer metadata discussion.
> >
> >
> > Discussion:
> >
> >    -
> >
> >    Extensions:
> >    -
> >
> >       Add functionality to read/write the extension and show that we can
> >       ignore it.
> >       -
> >
> >          1: write an extension and read the old footer that ignores it.
> >          -
> >
> >          2: write extension and allow reading it back.
> >          -
> >
> >    New metadata:
> >    -
> >
> >       Flatbuffer is bigger than thrift: need to optimize metadata
> >       -
> >
> >          Start from a 1-1 implementation to existing footer and keep
> >          iterating 1 commit at a time.
> >          -
> >
> >       Would like to have a branch in github arrow cpp or a public fork on
> >       github to share the prototype.
> >       -
> >
> >       Add to parquet-tool to print the footer.
> >       -
> >
> >          Add utility to obfuscate schema so that people can share their
> >          metadata without sharing proprietary information.
> >          -
> >
> >          That way we can have data about slow footers and validate we can
> >          read faster with the new footer
> >          -
> >
> >          => creation of a database of footers.
> >          -
> >
> >       Getting a feel of what features are used by users.
> >       -
> >
> >          Alkis would want to share his findings through a blog post.
> >          -
> >
> >       Also need to make sure the addition of the new footer doesn’t
> >       impact old footers too much.
> >       -
> >
> >       Possibly:
> >       -
> >
> >          Codspeed for performance testing
> >          -
> >
> >          Thrift linter: https://github.com/thrift-labs/thrift-fmt
> >          -
> >
> >       AI:
> >       -
> >
> >          [Julien] Create a parquet-benchmark repo for a footer db and
> >          other things
> >          -
> >
> >             Example: https://github.com/rok/parquet-benchmark
> >             -
> >
> >          Alkis to pick where on github to push his prototype branch
> >          -
> >
> >          Follow up on:
> >          -
> >
> >             https://github.com/apache/parquet-format/pull/445
> >
> >
>

Reply via email to