I opened a Draft PR to illustrate what this could look like. https://github.com/apache/parquet-format/pull/513 See in tree here: https://github.com/apache/parquet-format/tree/proposals/proposals
On Wed, Aug 6, 2025 at 3:30 PM Julien Le Dem <jul...@apache.org> wrote: > IMO, this doc is pretty close to being ready to be published. We can > always improve it as we go. > > I think that one important part of the whole process is to make it easy > for everyone to see what proposals are ongoing and their status and a clear > step to move from proposal/evaluation to implementation. > > Once we agree the doc is close enough, I would propose to publish it in > markdown on the parquet-format repo, organized as follows: > - The section "Baseline Requirements for new additions" as its own page, > documenting how to approach the design of a parquet change and the > underlying constraints. > - We add a physical process to list proposals in the parquet-format github > Repo as follows. > - The steps described in the section "Incorporating encoding/compression > improvements" become the process on how someone creates a proposal and > starts a POC. > - I would complement it by the following steps for people to publish their > proposals: > - We create a folder in the parquet-format repo to hold the proposals. > - a Readme in the folder tracks the ongoing POCs and status. > - Initiating a proposal starts with a github issue. We create a > template for it based on what's outlined in that section of the doc. > - If the discussion concludes that the proposal is worth a POC, > the author opens a PR to add the proposal in markdown in the proposals > folder. It links to the Github issue where the discussion preceding the > proposal occurred. More people can contribute to the POC as needed. > - POC and perf evaluation are implemented as part of the proposal. > - a vote by the PMC moves the proposal to actual feature in the format > (based on the criteria outlined in this doc). > - As part of the implementation step, we make sure we have cross > compatible implementations as we did for Variant. > - The section "Measuring improvements" becomes part of that process > section to explain how we'll decide if the addition is worth adding to the > spec for the complexity it is adding. > > If that makes sense to you all, I can draft a PR to make this proposal a > little more concrete. > > > > On Wed, Aug 6, 2025 at 11:08 AM Andrew Lamb <andrewlam...@gmail.com> > wrote: > >> I would like to bump this thread as it came up again on the parquet sync >> call today >> >> Specifically, it seems like there is increasing interest in adding new >> encodings to the Parquet, so getting consensus on what that process looks >> like and what is required is more important. >> >> If you are interested in this topic, please leave comments on the Google >> Doc[1] or reply to this email chain. >> >> Thank you, >> Andrew >> >> [1] >> >> https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0 >> >> On Thu, May 29, 2025 at 2:42 AM Micah Kornfield <emkornfi...@gmail.com> >> wrote: >> >> > I wrote up a long overdue draft >> > < >> > >> https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0 >> > > >> > [1] >> > on how we can move forward with additional features (it provides some >> > proposed requirements on both consuming third-party code, as well as >> some >> > more specific guidance on new encodings, and some orthogonal work that >> > would be nice to see). >> > >> > The doc still lacks some details, and might be too opinionated in places >> > but I think it serves as a good basis for conversation (and at least >> gets >> > me out of the critical path for evolving Parquet). >> > >> > I'm very excited to start moving forward with improvements. >> > >> > Thanks, >> > Micah >> > >> > [1] >> > >> > >> https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0 >> > >> >