Hi Alkis,
Thanks for the proposal, if you don't mind could we consolidate this
discussion on [1]

I'll post a link to your doc in a reply I'm about to make on that thread.

[1] https://lists.apache.org/thread/8ykqt4314bqycy7ds4215qbhl3dl60wq

On Wed, May 29, 2024 at 12:36 PM Alkis Evlogimenos
<alkis.evlogime...@databricks.com.invalid> wrote:

> Hi folks.
>
> It is great to see the community moving forward with changes to parquet
> metadata to make parquet work better in general and in particular with
> wider schemata.
>
> I have been looking at the current proposals:
> - https://github.com/apache/parquet-format/pull/242
> - https://github.com/apache/parquet-format/pull/248
> - https://github.com/apache/parquet-format/pull/250
>
> and took the consolidated feedback across all of them and put together yet
> another one. Here's the design sketch
> <
> https://docs.google.com/document/d/1PQpY418LkIDHMFYCY8ne_G-CFpThK15LLpzWYbc7rFU/edit
> >
> .
>
> What's different in this proposal is splitting the work into 3 tracks:
> T1. what we can do immediately in the current metadata datastructures
> T2. what we can do short term in the current metadata datastructures
> T3. provide safe and backwards compatible room for experimentation for all
> metadata (including every thrift struct even outside of FileMetaData) so
> that engines can iterate and propose the best format going forward for
> parquet
>
> 3 is important if we strongly believe that we can get the best design
> through testing prototypes  on real data and measuring the effects vs
> designing changes in PRs. Along the same lines, I am requesting that you
> ask through your contacts/customers (I will do the same) for scrubbed
> footers of particular interest (wide, deep, etc) so that we can build a set
> of real footers on which we can run benchmarks and drive design decisions.
>
> I am also putting normative PRs out for T1, T2, T3.
>
> Looking forward to your comments.
>

Reply via email to