On Tue, May 14, 2024, at 6:30 PM, Antoine Pitrou wrote:
> AFAIK, the only Parquet implementation under the Apache Parquet project
> is parquet-mr :-)

This is not true. The parquet-cpp that resides in the arrow repository is still 
controlled by the Apache Parquet PMC. Back then, we only merged the codebases 
but kept control of it with the Apache Parquet project. I know, it is hard to 
understand, but at least I have never seen a vote that would move it out of the 
Apache Parquet's project "control".

Best
Uwe
>
>
> On Tue, 14 May 2024 10:58:58 +0200
> Rok Mihevc <rok.mih...@gmail.com> wrote:
>> Second Raphael's point.
>> Would it be reasonable to say specification change requires implementation
>> in two parquet implementations within Apache Parquet project?
>> 
>> Rok
>> 
>> On Tue, May 14, 2024 at 10:50 AM Gang Wu 
>> <ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org> wrote:
>> 
>> > IMHO, it looks more reasonable if a reference implementation is required
>> > to support most (not all) elements from the specification.
>> >
>> > Another question is: should we discuss (and vote for) each candidate
>> > one by one? We can start with parquet-mr which is most well-known
>> > implementation.
>> >
>> > Best,
>> > Gang
>> >
>> > On Tue, May 14, 2024 at 4:41 PM Raphael Taylor-Davies
>> > <r.taylordav...@googlemail.com.invalid> wrote:
>> >  
>> > > Potentially it would be helpful to flip the question around. As Andrew
>> > > articulates, a reference implementation is required to implement all
>> > > elements from the specification, and therefore the major consequence of
>> > > labeling parquet-mr thusly would be that any specification change would
>> > > have to be implemented within parquet-mr as part of the standardisation
>> > > process. It would be insufficient for it to be implemented in, for
>> > > example, two of the parquet implementations maintained by the arrow
>> > > project. I personally think that would be a shame and likely exclude
>> > > many people who would otherwise be interested in evolving the parquet
>> > > specification, but think that is at the core of this question.
>> > >
>> > > Kind Regards,
>> > >
>> > > Raphael
>> > >
>> > > On 13/05/2024 20:55, Andrew Lamb wrote:  
>> > > > Question: Should we label parquet-mr or any other parquet  
>> > implementations  
>> > > > "reference" implications"?
>> > > >
>> > > > This came up as part of Vinoo's great PR to list different parquet
>> > > > reference implementations[1][2].
>> > > >
>> > > > The term "reference implementation" often has an official connotation. 
>> > > >  
>> > > For  
>> > > > example the wikipedia definition is "a program that implements all
>> > > > requirements from a corresponding specification. The reference
>> > > > implementation ... should be considered the "correct" behavior of any  
>> > > other  
>> > > > implementation of it."[3]
>> > > >
>> > > > Given the close association of parquet-mr to the parquet standard, it  
>> > is  
>> > > a  
>> > > > natural candidate to label as "reference implementation." However, it  
>> > is  
>> > > > not clear to me if there is consensus that it should be thusly labeled.
>> > > >
>> > > > I have a strong opinion that a consensus on this question would be very
>> > > > helpful. I don't actually have a strong opinion about the answer
>> > > >
>> > > > Andrew
>> > > >
>> > > >
>> > > >
>> > > > [1]:  
>> > > https://github.com/apache/parquet-site/pull/53#discussion_r1582882267  
>> > > > [2]:  
>> > > https://github.com/apache/parquet-site/pull/53#discussion_r1598283465  
>> > > > [3]:  https://en.wikipedia.org/wiki/Reference_implementation
>> > > >  
>> > >  
>> >  
>>

Reply via email to