Thank you Fokko.

PR is up: https://github.com/apache/parquet-benchmark/pull/1

On Tue, Aug 20, 2024 at 12:11 AM Julien Le Dem <jul...@apache.org> wrote:

> Thanks Fokko!
>
> On Mon, Aug 19, 2024 at 11:59 AM Fokko Driesprong <fo...@apache.org>
> wrote:
>
> > Done!
> >
> > Kind regards,
> > Fokko
> >
> > Op ma 19 aug 2024 om 20:52 schreef Alkis Evlogimenos
> > <alkis.evlogime...@databricks.com.invalid>:
> >
> > > Hello Julien. I finally got around compiling binaries for the
> > benchmarking
> > > repo. Can you add an empty README.md in
> > > https://github.com/apache/parquet-benchmark because otherwise I can't
> > fork
> > > an empty repo (!!!).
> > >
> > > Cheers,
> > >
> > > On Wed, Aug 7, 2024 at 12:52 AM Julien Le Dem <jul...@apache.org>
> wrote:
> > >
> > > > That works for me.
> > > > @Alkis Evlogimenos <alkis.evlogime...@databricks.com> When you open
> a
> > PR
> > > > on parquet-benchmark, just make it clear how this binary got there
> and
> > > that
> > > > it is an unofficial build from the arrow project waiting for an
> > official
> > > > release.
> > > >
> > > >
> > > >
> > > > On Tue, Aug 6, 2024 at 7:52 AM Rok Mihevc <rok.mih...@gmail.com>
> > wrote:
> > > >
> > > >> That would be a temporary solution until parquet-cpp is released?
> > Seems
> > > ok
> > > >> as it's a utility thing.
> > > >>
> > > >> On Tue, Aug 6, 2024 at 4:03 PM Alkis Evlogimenos
> > > >> <alkis.evlogime...@databricks.com.invalid> wrote:
> > > >>
> > > >> > Perhaps it is best to compile static binaries of the above and
> > upload
> > > to
> > > >> > https://github.com/apache/parquet-benchmark along with a readme?
> > > >> >
> > > >> > On Tue, Aug 6, 2024 at 4:30 PM Rok Mihevc <rok.mih...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > Arrow releases are cut ~every three months and the last release
> > was
> > > >> mid
> > > >> > > July (https://arrow.apache.org/release/17.0.0.html).
> > > >> > > I would speculate 18.0.0 will be public mid September.
> > > >> > >
> > > >> > > On Tue, Aug 6, 2024 at 3:20 PM Alkis Evlogimenos
> > > >> > > <alkis.evlogime...@databricks.com.invalid> wrote:
> > > >> > >
> > > >> > > > Thank you Julien. When can we expect a new arrow package
> release
> > > so
> > > >> > that
> > > >> > > I
> > > >> > > > can compile a doc for customers to donate footers to us?
> > > >> > > >
> > > >> > > > binary in question:
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://github.com/apache/arrow/blob/main/cpp/tools/parquet/parquet_dump_footer.cc
> > > >> > > >
> > > >> > > > On Sat, Aug 3, 2024 at 3:17 AM Julien Le Dem <
> jul...@apache.org
> > >
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > Following up on my action item, I have created the
> > > >> parquet-benchmark
> > > >> > > > repo:
> > > >> > > > > https://github.com/apache/parquet-benchmark
> > > >> > > > >
> > > >> > > > > On Wed, Jul 31, 2024 at 3:46 PM Julien Le Dem <
> > > jul...@apache.org>
> > > >> > > wrote:
> > > >> > > > >
> > > >> > > > > > Attendees:
> > > >> > > > > >
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Micah: Google, no special topic today
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Alkis: Databricks, storage stack. Topic: Parquet
> > extension
> > > >> PR so
> > > >> > > > that
> > > >> > > > > >    we can go in the format. Want to fix the metadata to
> make
> > > it
> > > >> > work
> > > >> > > > for
> > > >> > > > > wide
> > > >> > > > > >    schemas.
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Vinoo: Palantir -> startup in data space. Working on
> > > >> improving
> > > >> > the
> > > >> > > > > >    website.
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Julien: Datadog. Topic: Make parquet reading possible
> to
> > be
> > > >> done
> > > >> > > > > >    sequentially (as opposed to footer first)
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Rok: Voltron -> freelance in Fintech. Care about
> Parquet
> > > >> > > > performance.
> > > >> > > > > >    Have time to contribute to footers (“V3”).
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Follow up items:
> > > >> > > > > >
> > > >> > > > > > Mika’s Parquet format changes process
> > > >> > > > > >
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    First PR merged, need to finalize java
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    => Mostly done
> > > >> > > > > >
> > > >> > > > > > Jira -> github migration
> > > >> > > > > >
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Getting started with github. Will follow up on the
> > mailing
> > > >> list.
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    => mostly closed discussion. Some follow up async on
> the
> > > >> > > discussion.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Agenda:
> > > >> > > > > >
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Finalizing [EXTERNAL] Parquet extensions
> > > >> > > > > >    <
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit#heading=h.15ohoov5qqm6
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >       AI: Alkis Evlogimenos <
> > alkis.evlogime...@databricks.com
> > > >
> > > >> to
> > > >> > > > update
> > > >> > > > > >       PR with everything in the doc except Alternatives
> > > >> Considered
> > > >> > > and
> > > >> > > > > split the
> > > >> > > > > >       examples in another page.
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >    New footer metadata discussion.
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > Discussion:
> > > >> > > > > >
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >    Extensions:
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >       Add functionality to read/write the extension and
> show
> > > >> that
> > > >> > we
> > > >> > > > can
> > > >> > > > > >       ignore it.
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >          1: write an extension and read the old footer
> that
> > > >> ignores
> > > >> > > it.
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >          2: write extension and allow reading it back.
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >    New metadata:
> > > >> > > > > >    -
> > > >> > > > > >
> > > >> > > > > >       Flatbuffer is bigger than thrift: need to optimize
> > > >> metadata
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >          Start from a 1-1 implementation to existing
> footer
> > > and
> > > >> > keep
> > > >> > > > > >          iterating 1 commit at a time.
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >       Would like to have a branch in github arrow cpp or a
> > > >> public
> > > >> > > fork
> > > >> > > > on
> > > >> > > > > >       github to share the prototype.
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >       Add to parquet-tool to print the footer.
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >          Add utility to obfuscate schema so that people
> can
> > > >> share
> > > >> > > their
> > > >> > > > > >          metadata without sharing proprietary information.
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >          That way we can have data about slow footers and
> > > >> validate
> > > >> > we
> > > >> > > > can
> > > >> > > > > >          read faster with the new footer
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >          => creation of a database of footers.
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >       Getting a feel of what features are used by users.
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >          Alkis would want to share his findings through a
> > blog
> > > >> > post.
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >       Also need to make sure the addition of the new
> footer
> > > >> doesn’t
> > > >> > > > > >       impact old footers too much.
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >       Possibly:
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >          Codspeed for performance testing
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >          Thrift linter:
> > > >> https://github.com/thrift-labs/thrift-fmt
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >       AI:
> > > >> > > > > >       -
> > > >> > > > > >
> > > >> > > > > >          [Julien] Create a parquet-benchmark repo for a
> > footer
> > > >> db
> > > >> > and
> > > >> > > > > >          other things
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >             Example:
> > https://github.com/rok/parquet-benchmark
> > > >> > > > > >             -
> > > >> > > > > >
> > > >> > > > > >          Alkis to pick where on github to push his
> prototype
> > > >> branch
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >          Follow up on:
> > > >> > > > > >          -
> > > >> > > > > >
> > > >> > > > > >
> > https://github.com/apache/parquet-format/pull/445
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Reply via email to