Hello Julien. I finally got around compiling binaries for the benchmarking
repo. Can you add an empty README.md in
https://github.com/apache/parquet-benchmark because otherwise I can't fork
an empty repo (!!!).

Cheers,

On Wed, Aug 7, 2024 at 12:52 AM Julien Le Dem <jul...@apache.org> wrote:

> That works for me.
> @Alkis Evlogimenos <alkis.evlogime...@databricks.com> When you open a PR
> on parquet-benchmark, just make it clear how this binary got there and that
> it is an unofficial build from the arrow project waiting for an official
> release.
>
>
>
> On Tue, Aug 6, 2024 at 7:52 AM Rok Mihevc <rok.mih...@gmail.com> wrote:
>
>> That would be a temporary solution until parquet-cpp is released? Seems ok
>> as it's a utility thing.
>>
>> On Tue, Aug 6, 2024 at 4:03 PM Alkis Evlogimenos
>> <alkis.evlogime...@databricks.com.invalid> wrote:
>>
>> > Perhaps it is best to compile static binaries of the above and upload to
>> > https://github.com/apache/parquet-benchmark along with a readme?
>> >
>> > On Tue, Aug 6, 2024 at 4:30 PM Rok Mihevc <rok.mih...@gmail.com> wrote:
>> >
>> > > Arrow releases are cut ~every three months and the last release was
>> mid
>> > > July (https://arrow.apache.org/release/17.0.0.html).
>> > > I would speculate 18.0.0 will be public mid September.
>> > >
>> > > On Tue, Aug 6, 2024 at 3:20 PM Alkis Evlogimenos
>> > > <alkis.evlogime...@databricks.com.invalid> wrote:
>> > >
>> > > > Thank you Julien. When can we expect a new arrow package release so
>> > that
>> > > I
>> > > > can compile a doc for customers to donate footers to us?
>> > > >
>> > > > binary in question:
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/arrow/blob/main/cpp/tools/parquet/parquet_dump_footer.cc
>> > > >
>> > > > On Sat, Aug 3, 2024 at 3:17 AM Julien Le Dem <jul...@apache.org>
>> > wrote:
>> > > >
>> > > > > Following up on my action item, I have created the
>> parquet-benchmark
>> > > > repo:
>> > > > > https://github.com/apache/parquet-benchmark
>> > > > >
>> > > > > On Wed, Jul 31, 2024 at 3:46 PM Julien Le Dem <jul...@apache.org>
>> > > wrote:
>> > > > >
>> > > > > > Attendees:
>> > > > > >
>> > > > > >    -
>> > > > > >
>> > > > > >    Micah: Google, no special topic today
>> > > > > >    -
>> > > > > >
>> > > > > >    Alkis: Databricks, storage stack. Topic: Parquet extension
>> PR so
>> > > > that
>> > > > > >    we can go in the format. Want to fix the metadata to make it
>> > work
>> > > > for
>> > > > > wide
>> > > > > >    schemas.
>> > > > > >    -
>> > > > > >
>> > > > > >    Vinoo: Palantir -> startup in data space. Working on
>> improving
>> > the
>> > > > > >    website.
>> > > > > >    -
>> > > > > >
>> > > > > >    Julien: Datadog. Topic: Make parquet reading possible to be
>> done
>> > > > > >    sequentially (as opposed to footer first)
>> > > > > >    -
>> > > > > >
>> > > > > >    Rok: Voltron -> freelance in Fintech. Care about Parquet
>> > > > performance.
>> > > > > >    Have time to contribute to footers (“V3”).
>> > > > > >
>> > > > > >
>> > > > > > Follow up items:
>> > > > > >
>> > > > > > Mika’s Parquet format changes process
>> > > > > >
>> > > > > >    -
>> > > > > >
>> > > > > >    First PR merged, need to finalize java
>> > > > > >    -
>> > > > > >
>> > > > > >    => Mostly done
>> > > > > >
>> > > > > > Jira -> github migration
>> > > > > >
>> > > > > >    -
>> > > > > >
>> > > > > >    Getting started with github. Will follow up on the mailing
>> list.
>> > > > > >    -
>> > > > > >
>> > > > > >    => mostly closed discussion. Some follow up async on the
>> > > discussion.
>> > > > > >
>> > > > > >
>> > > > > > Agenda:
>> > > > > >
>> > > > > >    -
>> > > > > >
>> > > > > >    Finalizing [EXTERNAL] Parquet extensions
>> > > > > >    <
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit#heading=h.15ohoov5qqm6
>> > > > > >
>> > > > > >
>> > > > > >    -
>> > > > > >
>> > > > > >       AI: Alkis Evlogimenos <alkis.evlogime...@databricks.com>
>> to
>> > > > update
>> > > > > >       PR with everything in the doc except Alternatives
>> Considered
>> > > and
>> > > > > split the
>> > > > > >       examples in another page.
>> > > > > >       -
>> > > > > >
>> > > > > >    New footer metadata discussion.
>> > > > > >
>> > > > > >
>> > > > > > Discussion:
>> > > > > >
>> > > > > >    -
>> > > > > >
>> > > > > >    Extensions:
>> > > > > >    -
>> > > > > >
>> > > > > >       Add functionality to read/write the extension and show
>> that
>> > we
>> > > > can
>> > > > > >       ignore it.
>> > > > > >       -
>> > > > > >
>> > > > > >          1: write an extension and read the old footer that
>> ignores
>> > > it.
>> > > > > >          -
>> > > > > >
>> > > > > >          2: write extension and allow reading it back.
>> > > > > >          -
>> > > > > >
>> > > > > >    New metadata:
>> > > > > >    -
>> > > > > >
>> > > > > >       Flatbuffer is bigger than thrift: need to optimize
>> metadata
>> > > > > >       -
>> > > > > >
>> > > > > >          Start from a 1-1 implementation to existing footer and
>> > keep
>> > > > > >          iterating 1 commit at a time.
>> > > > > >          -
>> > > > > >
>> > > > > >       Would like to have a branch in github arrow cpp or a
>> public
>> > > fork
>> > > > on
>> > > > > >       github to share the prototype.
>> > > > > >       -
>> > > > > >
>> > > > > >       Add to parquet-tool to print the footer.
>> > > > > >       -
>> > > > > >
>> > > > > >          Add utility to obfuscate schema so that people can
>> share
>> > > their
>> > > > > >          metadata without sharing proprietary information.
>> > > > > >          -
>> > > > > >
>> > > > > >          That way we can have data about slow footers and
>> validate
>> > we
>> > > > can
>> > > > > >          read faster with the new footer
>> > > > > >          -
>> > > > > >
>> > > > > >          => creation of a database of footers.
>> > > > > >          -
>> > > > > >
>> > > > > >       Getting a feel of what features are used by users.
>> > > > > >       -
>> > > > > >
>> > > > > >          Alkis would want to share his findings through a blog
>> > post.
>> > > > > >          -
>> > > > > >
>> > > > > >       Also need to make sure the addition of the new footer
>> doesn’t
>> > > > > >       impact old footers too much.
>> > > > > >       -
>> > > > > >
>> > > > > >       Possibly:
>> > > > > >       -
>> > > > > >
>> > > > > >          Codspeed for performance testing
>> > > > > >          -
>> > > > > >
>> > > > > >          Thrift linter:
>> https://github.com/thrift-labs/thrift-fmt
>> > > > > >          -
>> > > > > >
>> > > > > >       AI:
>> > > > > >       -
>> > > > > >
>> > > > > >          [Julien] Create a parquet-benchmark repo for a footer
>> db
>> > and
>> > > > > >          other things
>> > > > > >          -
>> > > > > >
>> > > > > >             Example: https://github.com/rok/parquet-benchmark
>> > > > > >             -
>> > > > > >
>> > > > > >          Alkis to pick where on github to push his prototype
>> branch
>> > > > > >          -
>> > > > > >
>> > > > > >          Follow up on:
>> > > > > >          -
>> > > > > >
>> > > > > >             https://github.com/apache/parquet-format/pull/445
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Reply via email to