This is an automated email from the ASF dual-hosted git repository. julien pushed a commit to branch proposals in repository https://gitbox.apache.org/repos/asf/parquet-format.git
commit e0cc257356c4558dbd9942e20f65e58963f2f07b Author: Julien Le Dem <[email protected]> AuthorDate: Thu Aug 7 17:00:22 2025 -0700 start with an example --- proposals/1_BASE64_ENCODING.md | 24 +++++++++++++++++++++ proposals/README.md | 47 +++++++++++++++++++++++++++++++++++++++++ proposals/_PROPOSAL_TEMPLATE.md | 22 +++++++++++++++++++ 3 files changed, 93 insertions(+) diff --git a/proposals/1_BASE64_ENCODING.md b/proposals/1_BASE64_ENCODING.md new file mode 100644 index 0000000..cb91ec0 --- /dev/null +++ b/proposals/1_BASE64_ENCODING.md @@ -0,0 +1,24 @@ +# Proposal + +--- +Author: Julien Le Dem +Created: 2025-Aug-7 +Name: add BASE64 compression +Issue: https://github.com/apache/parquet-format/issues/NNN +Status: ARCHIVED +Reason: Did not compress +--- + +## Description +Add Base64 to compression algorithms. +This is not backwards compatible as a new compression alg. + +## Spec + +See [BASE64 spec]. + +## Evaluation + +After trying out in the java implementation, file size doubled on average. +See prototype [here](github.com/julienledem/mypoc) + diff --git a/proposals/README.md b/proposals/README.md new file mode 100644 index 0000000..24445f5 --- /dev/null +++ b/proposals/README.md @@ -0,0 +1,47 @@ +# Proposals + +## Requirements + +See the [requirements document](https://docs.google.com/document/d/1qGDnOyoNyPvcN4FCRhbZGAvp0SfewlWo-WVsai5IKUo/edit?tab=t.0#heading=h.v4emiipkghrx) (Note: this doc would become a markdown page in the repo) + +## Proposal lifecycle + +Discuss -> Draft -> POC -> Approved -> Implementation -> Release + +### Discuss +Start a [DISCUSS] thread on the mailing list ([email protected]) with your idea. +Once you have a better idea of the direction, open a github issue using the proposal template. +You can attach a google doc to collaborate on the general idea with the community. + +### Draft +Once the discussion has stabilized and you are ready to start a POC, open a PR to add a new Markdown file in the proposals folder and give more visibility to the work in progress. + +### POC +The proposal document can evolve along the course of the POC. In particular to add more links to findings and performance evaluations. Collaboration is encouraged. More validation on the POC increases the chances of success. + +Make sure you consider the [#Requirements] to ensure the success of the POC. + +### Approved for Implementation +Once the POC has concluded, we should have a clear idea of whether we want to pursue the implementation accross the ecosystem. A PMC vote will formalize that stage + +### Implementation +At this stage we need to meet the contribution guidelines to confsider the implementation finished (ex: two independent implementations with cross compatibility tests, spec updated, ...) + +### Release +Once the implementation phase is finished, we can include the contribution in the next release. + +## Active Proposals + +| ID | Description | Status | +| [github issue] | adding this new encoding | POC | +| [github issue] | add Variant typea | Implementation | + +## Implemented + | ID | Description | Status | release it was added | +| [gihub issue] | encryption | Completed | x.y.z | + +## Archived + +| ID | Description | Status | reason for archiving | +| [github issue] | [adding base64 compression](1_BASE64_ENCODING.md) | Archived | POC showed that compression ratio was not practical | + diff --git a/proposals/_PROPOSAL_TEMPLATE.md b/proposals/_PROPOSAL_TEMPLATE.md new file mode 100644 index 0000000..f33b3ee --- /dev/null +++ b/proposals/_PROPOSAL_TEMPLATE.md @@ -0,0 +1,22 @@ +# Proposal + +--- +Author: ~your name~ +Created: ~date~ +Name: *short sentence describing the proposal* +Issue: https://github.com/apache/parquet-format/issues/NNN +Status: DRAFT|POC|ACCEPTED|COMPLETED +--- + +## Description +*Short description of the proposal. Is it a new encoding? Is it backwards compatible (old readers will just ignore it)? Is it additional metadata?* + +## Spec + +At the proposal stage you don't need a fully fleshed out spec yet. +Please add any link to relevant documentation, papers, etc. +at the implementation stage, the details will need to be all clarified. + +## Evaluation +What datasets is it tested on and what is a success criteria +Please add any link to the relevant codebase.
