I just wanted to follow up and say thank you Antoine for updating the
description of your PR and bringing the discussion back to the doc. This is
helpful.
https://github.com/apache/parquet-format/pull/242

On Fri, May 17, 2024 at 10:37 AM Julien Le Dem <jul...@apache.org> wrote:

> This context should be added in the PR description itself. My main point
> is to keep the discussion connected rather than starting new threads on
> the mailing list or PRs on github that don't refer to the original doc they
> are connected to.
>
> From a design process perspective, it makes more difficult to converge the
> discussion and build consensus if we start multiple threads rather than
> keeping the discussion on the original thread.
>
> Goals are pretty concrete, but we have to write them down to make them
> clear.
> They are what motivates the change to the metadata. Discussing the changes
> in a PR without agreeing on why we're doing them is premature. Similarly
> before doing benchmarks we need to agree on what we are optimizing for.
>
> PRs
>
>
> On Fri, May 17, 2024 at 1:48 AM Antoine Pitrou <anto...@python.org> wrote:
>
>>
>> Hi Julien,
>>
>> Yes, I posted comments on Micah's document, and I referenced this PR in
>> those discussions. Personally, I feel more comfortable when I have some
>> concrete proposal to comment on, rather than abstract goals, and I
>> figured other people might be like me. Discussing actual Thrift
>> metadata makes it clearer to me where the friction points might reside,
>> and what the opportunities might be.
>>
>> These changes might also later serve as an experimentation platform to
>> run crude benchmarks and try to validate what's really needed for the
>> wide-schema case to be handled efficiently.
>>
>> They are not intended to be submitted for inclusion anytime soon, and
>> I'm not planning to push for them if someone comes up with something
>> better and more thought out.
>>
>> All in all, this started as a personal investigation to understand
>> whether and how a "v3 schema" could be made backwards-compatible, and
>> when I saw that it seemed actually doable I decided it would be worth
>> posting the initial sketch instead of keeping it for myself.
>>
>> Regards
>>
>> Antoine.
>>
>>
>> On Thu, 16 May 2024 18:41:26 -0700
>> Julien Le Dem <jul...@apache.org> wrote:
>> > Hi Antoine,
>> >
>> > On the other thread Micah is collecting feedback in a document.
>> > https://lists.apache.org/thread/61z98xgq2f76jxfjgn5xfq1jhxwm3jwf
>> >
>> > Would you mind putting your feedback there?
>> > We should collect the goals before jumping to solutions.
>> > It is a bit difficult to discuss those directly in the thrift metadata.
>> >
>> > Thank you
>> >
>> >
>> > On Thu, May 16, 2024 at 4:13 AM Antoine Pitrou <
>> antoine-+zn9apsxkcednm+yrof...@public.gmane.org> wrote:
>> >
>> > >
>> > > Hello,
>> > >
>> > > In the light of recent discussions, I've put up a very rough proposal
>> > > of a Parquet 3 metadata format that allows both for light-weight
>> > > file-level metadata and backwards compatibility with legacy readers.
>> > >
>> > > For the sake of convenience and out of personal preference, I've made
>> > > this a PR to parquet-format rather than a Google Doc:
>> > > https://github.com/apache/parquet-format/pull/242
>> > >
>> > > Feel free to point any glaring mistakes or misunderstandings on my
>> part,
>> > > or to comment on details.
>> > >
>> > > Regards
>> > >
>> > > Antoine.
>> > >
>> > >
>> > >
>> >
>>
>>
>>
>>

Reply via email to