I just wanted to follow up and say thank you Antoine for updating the description of your PR and bringing the discussion back to the doc. This is helpful. https://github.com/apache/parquet-format/pull/242
On Fri, May 17, 2024 at 10:37 AM Julien Le Dem <jul...@apache.org> wrote: > This context should be added in the PR description itself. My main point > is to keep the discussion connected rather than starting new threads on > the mailing list or PRs on github that don't refer to the original doc they > are connected to. > > From a design process perspective, it makes more difficult to converge the > discussion and build consensus if we start multiple threads rather than > keeping the discussion on the original thread. > > Goals are pretty concrete, but we have to write them down to make them > clear. > They are what motivates the change to the metadata. Discussing the changes > in a PR without agreeing on why we're doing them is premature. Similarly > before doing benchmarks we need to agree on what we are optimizing for. > > PRs > > > On Fri, May 17, 2024 at 1:48 AM Antoine Pitrou <anto...@python.org> wrote: > >> >> Hi Julien, >> >> Yes, I posted comments on Micah's document, and I referenced this PR in >> those discussions. Personally, I feel more comfortable when I have some >> concrete proposal to comment on, rather than abstract goals, and I >> figured other people might be like me. Discussing actual Thrift >> metadata makes it clearer to me where the friction points might reside, >> and what the opportunities might be. >> >> These changes might also later serve as an experimentation platform to >> run crude benchmarks and try to validate what's really needed for the >> wide-schema case to be handled efficiently. >> >> They are not intended to be submitted for inclusion anytime soon, and >> I'm not planning to push for them if someone comes up with something >> better and more thought out. >> >> All in all, this started as a personal investigation to understand >> whether and how a "v3 schema" could be made backwards-compatible, and >> when I saw that it seemed actually doable I decided it would be worth >> posting the initial sketch instead of keeping it for myself. >> >> Regards >> >> Antoine. >> >> >> On Thu, 16 May 2024 18:41:26 -0700 >> Julien Le Dem <jul...@apache.org> wrote: >> > Hi Antoine, >> > >> > On the other thread Micah is collecting feedback in a document. >> > https://lists.apache.org/thread/61z98xgq2f76jxfjgn5xfq1jhxwm3jwf >> > >> > Would you mind putting your feedback there? >> > We should collect the goals before jumping to solutions. >> > It is a bit difficult to discuss those directly in the thrift metadata. >> > >> > Thank you >> > >> > >> > On Thu, May 16, 2024 at 4:13 AM Antoine Pitrou < >> antoine-+zn9apsxkcednm+yrof...@public.gmane.org> wrote: >> > >> > > >> > > Hello, >> > > >> > > In the light of recent discussions, I've put up a very rough proposal >> > > of a Parquet 3 metadata format that allows both for light-weight >> > > file-level metadata and backwards compatibility with legacy readers. >> > > >> > > For the sake of convenience and out of personal preference, I've made >> > > this a PR to parquet-format rather than a Google Doc: >> > > https://github.com/apache/parquet-format/pull/242 >> > > >> > > Feel free to point any glaring mistakes or misunderstandings on my >> part, >> > > or to comment on details. >> > > >> > > Regards >> > > >> > > Antoine. >> > > >> > > >> > > >> > >> >> >> >>