I opened a Draft PR to illustrate what this could look like.
https://github.com/apache/parquet-format/pull/513
See in tree here:
https://github.com/apache/parquet-format/tree/proposals/proposals

On Wed, Aug 6, 2025 at 3:30 PM Julien Le Dem <jul...@apache.org> wrote:

> IMO, this doc is pretty close to being ready to be published. We can
> always improve it as we go.
>
> I think that one important part of the whole process is to make it easy
> for everyone to see what proposals are ongoing and their status and a clear
> step to move from proposal/evaluation to implementation.
>
> Once we agree the doc is close enough, I would propose to publish it in
> markdown on the parquet-format repo, organized as follows:
> - The section "Baseline Requirements for new additions" as its own page,
> documenting how to approach the design of a parquet change and the
> underlying constraints.
> - We add a physical process to list proposals in the parquet-format github
> Repo as follows.
> - The steps described in the section "Incorporating encoding/compression
> improvements" become the process on how someone creates a proposal and
> starts a POC.
> - I would complement it by the following steps for people to publish their
> proposals:
>    - We create a folder in the parquet-format repo to hold the proposals.
>    - a Readme in the folder tracks the ongoing POCs and status.
>    - Initiating a proposal starts with a github issue. We create a
> template for it based on what's outlined in that section of the doc.
>    - If the discussion concludes that the proposal is worth a POC,
> the author opens a PR to add the proposal in markdown in the proposals
> folder. It links to the Github issue where the discussion preceding the
> proposal occurred. More people can contribute to the POC as needed.
>    - POC and perf evaluation are implemented as part of the proposal.
>    - a vote by the PMC moves the proposal to actual feature in the format
> (based on the criteria outlined in this doc).
>    - As part of the implementation step, we make sure we have cross
> compatible implementations as we did for Variant.
> - The section "Measuring improvements" becomes part of that process
> section to explain how we'll decide if the addition is worth adding to the
> spec for the complexity it is adding.
>
> If that makes sense to you all, I can draft a PR to make this proposal a
> little more concrete.
>
>
>
> On Wed, Aug 6, 2025 at 11:08 AM Andrew Lamb <andrewlam...@gmail.com>
> wrote:
>
>> I would like to bump this thread as it came up again on the parquet sync
>> call today
>>
>> Specifically, it seems like there is increasing interest in adding new
>> encodings to the Parquet, so getting consensus on what that process looks
>> like and what is required is more important.
>>
>> If you are interested in this topic, please leave comments on the Google
>> Doc[1] or reply to this email chain.
>>
>> Thank you,
>> Andrew
>>
>> [1]
>>
>> https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0
>>
>> On Thu, May 29, 2025 at 2:42 AM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>> > I wrote up a long overdue draft
>> > <
>> >
>> https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0
>> > >
>> > [1]
>> > on how we can move forward with additional features (it provides some
>> > proposed requirements on both consuming third-party code, as well as
>> some
>> > more specific guidance on new encodings, and some orthogonal work that
>> > would be nice to see).
>> >
>> > The doc still lacks some details, and might be too opinionated in places
>> > but I think it serves as a good basis for conversation (and at least
>> gets
>> > me out of the critical path for evolving Parquet).
>> >
>> > I'm very excited to start moving forward with improvements.
>> >
>> > Thanks,
>> > Micah
>> >
>> > [1]
>> >
>> >
>> https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0
>> >
>>
>

Reply via email to