Hi Alkis, Thanks for the proposal, if you don't mind could we consolidate this discussion on [1]
I'll post a link to your doc in a reply I'm about to make on that thread. [1] https://lists.apache.org/thread/8ykqt4314bqycy7ds4215qbhl3dl60wq On Wed, May 29, 2024 at 12:36 PM Alkis Evlogimenos <alkis.evlogime...@databricks.com.invalid> wrote: > Hi folks. > > It is great to see the community moving forward with changes to parquet > metadata to make parquet work better in general and in particular with > wider schemata. > > I have been looking at the current proposals: > - https://github.com/apache/parquet-format/pull/242 > - https://github.com/apache/parquet-format/pull/248 > - https://github.com/apache/parquet-format/pull/250 > > and took the consolidated feedback across all of them and put together yet > another one. Here's the design sketch > < > https://docs.google.com/document/d/1PQpY418LkIDHMFYCY8ne_G-CFpThK15LLpzWYbc7rFU/edit > > > . > > What's different in this proposal is splitting the work into 3 tracks: > T1. what we can do immediately in the current metadata datastructures > T2. what we can do short term in the current metadata datastructures > T3. provide safe and backwards compatible room for experimentation for all > metadata (including every thrift struct even outside of FileMetaData) so > that engines can iterate and propose the best format going forward for > parquet > > 3 is important if we strongly believe that we can get the best design > through testing prototypes on real data and measuring the effects vs > designing changes in PRs. Along the same lines, I am requesting that you > ask through your contacts/customers (I will do the same) for scrubbed > footers of particular interest (wide, deep, etc) so that we can build a set > of real footers on which we can run benchmarks and drive design decisions. > > I am also putting normative PRs out for T1, T2, T3. > > Looking forward to your comments. >