IMO, this doc is pretty close to being ready to be published. We can always improve it as we go.
I think that one important part of the whole process is to make it easy for everyone to see what proposals are ongoing and their status and a clear step to move from proposal/evaluation to implementation. Once we agree the doc is close enough, I would propose to publish it in markdown on the parquet-format repo, organized as follows: - The section "Baseline Requirements for new additions" as its own page, documenting how to approach the design of a parquet change and the underlying constraints. - We add a physical process to list proposals in the parquet-format github Repo as follows. - The steps described in the section "Incorporating encoding/compression improvements" become the process on how someone creates a proposal and starts a POC. - I would complement it by the following steps for people to publish their proposals: - We create a folder in the parquet-format repo to hold the proposals. - a Readme in the folder tracks the ongoing POCs and status. - Initiating a proposal starts with a github issue. We create a template for it based on what's outlined in that section of the doc. - If the discussion concludes that the proposal is worth a POC, the author opens a PR to add the proposal in markdown in the proposals folder. It links to the Github issue where the discussion preceding the proposal occurred. More people can contribute to the POC as needed. - POC and perf evaluation are implemented as part of the proposal. - a vote by the PMC moves the proposal to actual feature in the format (based on the criteria outlined in this doc). - As part of the implementation step, we make sure we have cross compatible implementations as we did for Variant. - The section "Measuring improvements" becomes part of that process section to explain how we'll decide if the addition is worth adding to the spec for the complexity it is adding. If that makes sense to you all, I can draft a PR to make this proposal a little more concrete. On Wed, Aug 6, 2025 at 11:08 AM Andrew Lamb <andrewlam...@gmail.com> wrote: > I would like to bump this thread as it came up again on the parquet sync > call today > > Specifically, it seems like there is increasing interest in adding new > encodings to the Parquet, so getting consensus on what that process looks > like and what is required is more important. > > If you are interested in this topic, please leave comments on the Google > Doc[1] or reply to this email chain. > > Thank you, > Andrew > > [1] > > https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0 > > On Thu, May 29, 2025 at 2:42 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > I wrote up a long overdue draft > > < > > > https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0 > > > > > [1] > > on how we can move forward with additional features (it provides some > > proposed requirements on both consuming third-party code, as well as some > > more specific guidance on new encodings, and some orthogonal work that > > would be nice to see). > > > > The doc still lacks some details, and might be too opinionated in places > > but I think it serves as a good basis for conversation (and at least gets > > me out of the critical path for evolving Parquet). > > > > I'm very excited to start moving forward with improvements. > > > > Thanks, > > Micah > > > > [1] > > > > > https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0 > > >