That works for me. @Alkis Evlogimenos <alkis.evlogime...@databricks.com> When you open a PR on parquet-benchmark, just make it clear how this binary got there and that it is an unofficial build from the arrow project waiting for an official release.
On Tue, Aug 6, 2024 at 7:52 AM Rok Mihevc <rok.mih...@gmail.com> wrote: > That would be a temporary solution until parquet-cpp is released? Seems ok > as it's a utility thing. > > On Tue, Aug 6, 2024 at 4:03 PM Alkis Evlogimenos > <alkis.evlogime...@databricks.com.invalid> wrote: > > > Perhaps it is best to compile static binaries of the above and upload to > > https://github.com/apache/parquet-benchmark along with a readme? > > > > On Tue, Aug 6, 2024 at 4:30 PM Rok Mihevc <rok.mih...@gmail.com> wrote: > > > > > Arrow releases are cut ~every three months and the last release was mid > > > July (https://arrow.apache.org/release/17.0.0.html). > > > I would speculate 18.0.0 will be public mid September. > > > > > > On Tue, Aug 6, 2024 at 3:20 PM Alkis Evlogimenos > > > <alkis.evlogime...@databricks.com.invalid> wrote: > > > > > > > Thank you Julien. When can we expect a new arrow package release so > > that > > > I > > > > can compile a doc for customers to donate footers to us? > > > > > > > > binary in question: > > > > > > > > > > > > > > https://github.com/apache/arrow/blob/main/cpp/tools/parquet/parquet_dump_footer.cc > > > > > > > > On Sat, Aug 3, 2024 at 3:17 AM Julien Le Dem <jul...@apache.org> > > wrote: > > > > > > > > > Following up on my action item, I have created the > parquet-benchmark > > > > repo: > > > > > https://github.com/apache/parquet-benchmark > > > > > > > > > > On Wed, Jul 31, 2024 at 3:46 PM Julien Le Dem <jul...@apache.org> > > > wrote: > > > > > > > > > > > Attendees: > > > > > > > > > > > > - > > > > > > > > > > > > Micah: Google, no special topic today > > > > > > - > > > > > > > > > > > > Alkis: Databricks, storage stack. Topic: Parquet extension PR > so > > > > that > > > > > > we can go in the format. Want to fix the metadata to make it > > work > > > > for > > > > > wide > > > > > > schemas. > > > > > > - > > > > > > > > > > > > Vinoo: Palantir -> startup in data space. Working on improving > > the > > > > > > website. > > > > > > - > > > > > > > > > > > > Julien: Datadog. Topic: Make parquet reading possible to be > done > > > > > > sequentially (as opposed to footer first) > > > > > > - > > > > > > > > > > > > Rok: Voltron -> freelance in Fintech. Care about Parquet > > > > performance. > > > > > > Have time to contribute to footers (“V3”). > > > > > > > > > > > > > > > > > > Follow up items: > > > > > > > > > > > > Mika’s Parquet format changes process > > > > > > > > > > > > - > > > > > > > > > > > > First PR merged, need to finalize java > > > > > > - > > > > > > > > > > > > => Mostly done > > > > > > > > > > > > Jira -> github migration > > > > > > > > > > > > - > > > > > > > > > > > > Getting started with github. Will follow up on the mailing > list. > > > > > > - > > > > > > > > > > > > => mostly closed discussion. Some follow up async on the > > > discussion. > > > > > > > > > > > > > > > > > > Agenda: > > > > > > > > > > > > - > > > > > > > > > > > > Finalizing [EXTERNAL] Parquet extensions > > > > > > < > > > > > > > > > > > > > > > https://docs.google.com/document/d/1KkoR0DjzYnLQXO-d0oRBv2k157IZU0_injqd4eV4WiI/edit#heading=h.15ohoov5qqm6 > > > > > > > > > > > > > > > > > > - > > > > > > > > > > > > AI: Alkis Evlogimenos <alkis.evlogime...@databricks.com> > to > > > > update > > > > > > PR with everything in the doc except Alternatives > Considered > > > and > > > > > split the > > > > > > examples in another page. > > > > > > - > > > > > > > > > > > > New footer metadata discussion. > > > > > > > > > > > > > > > > > > Discussion: > > > > > > > > > > > > - > > > > > > > > > > > > Extensions: > > > > > > - > > > > > > > > > > > > Add functionality to read/write the extension and show that > > we > > > > can > > > > > > ignore it. > > > > > > - > > > > > > > > > > > > 1: write an extension and read the old footer that > ignores > > > it. > > > > > > - > > > > > > > > > > > > 2: write extension and allow reading it back. > > > > > > - > > > > > > > > > > > > New metadata: > > > > > > - > > > > > > > > > > > > Flatbuffer is bigger than thrift: need to optimize metadata > > > > > > - > > > > > > > > > > > > Start from a 1-1 implementation to existing footer and > > keep > > > > > > iterating 1 commit at a time. > > > > > > - > > > > > > > > > > > > Would like to have a branch in github arrow cpp or a public > > > fork > > > > on > > > > > > github to share the prototype. > > > > > > - > > > > > > > > > > > > Add to parquet-tool to print the footer. > > > > > > - > > > > > > > > > > > > Add utility to obfuscate schema so that people can share > > > their > > > > > > metadata without sharing proprietary information. > > > > > > - > > > > > > > > > > > > That way we can have data about slow footers and > validate > > we > > > > can > > > > > > read faster with the new footer > > > > > > - > > > > > > > > > > > > => creation of a database of footers. > > > > > > - > > > > > > > > > > > > Getting a feel of what features are used by users. > > > > > > - > > > > > > > > > > > > Alkis would want to share his findings through a blog > > post. > > > > > > - > > > > > > > > > > > > Also need to make sure the addition of the new footer > doesn’t > > > > > > impact old footers too much. > > > > > > - > > > > > > > > > > > > Possibly: > > > > > > - > > > > > > > > > > > > Codspeed for performance testing > > > > > > - > > > > > > > > > > > > Thrift linter: > https://github.com/thrift-labs/thrift-fmt > > > > > > - > > > > > > > > > > > > AI: > > > > > > - > > > > > > > > > > > > [Julien] Create a parquet-benchmark repo for a footer db > > and > > > > > > other things > > > > > > - > > > > > > > > > > > > Example: https://github.com/rok/parquet-benchmark > > > > > > - > > > > > > > > > > > > Alkis to pick where on github to push his prototype > branch > > > > > > - > > > > > > > > > > > > Follow up on: > > > > > > - > > > > > > > > > > > > https://github.com/apache/parquet-format/pull/445 > > > > > > > > > > > > > > > > > > > > > > > > > > >