I agree that parquet-mr implementation is a requirement to evolve the spec.
It makes sense to me that we call parquet-mr the reference implementation
and make it a requirement to evolve the spec.
I would add the requirement to implement it in the parquet cpp
implementation that lives in apache Arrow:
https://github.com/apache/arrow/tree/main/cpp/src/parquet
This code used to live in the parquet-cpp repo in the Parquet project.
Being language agnostic is an important feature of the format.
Interoperability tests should also be included.

On Tue, May 14, 2024 at 9:31 AM Antoine Pitrou <anto...@python.org> wrote:

>
> AFAIK, the only Parquet implementation under the Apache Parquet project
> is parquet-mr :-)
>
>
> On Tue, 14 May 2024 10:58:58 +0200
> Rok Mihevc <rok.mih...@gmail.com> wrote:
> > Second Raphael's point.
> > Would it be reasonable to say specification change requires
> implementation
> > in two parquet implementations within Apache Parquet project?
> >
> > Rok
> >
> > On Tue, May 14, 2024 at 10:50 AM Gang Wu <
> ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org> wrote:
> >
> > > IMHO, it looks more reasonable if a reference implementation is
> required
> > > to support most (not all) elements from the specification.
> > >
> > > Another question is: should we discuss (and vote for) each candidate
> > > one by one? We can start with parquet-mr which is most well-known
> > > implementation.
> > >
> > > Best,
> > > Gang
> > >
> > > On Tue, May 14, 2024 at 4:41 PM Raphael Taylor-Davies
> > > <r.taylordav...@googlemail.com.invalid> wrote:
> > >
> > > > Potentially it would be helpful to flip the question around. As
> Andrew
> > > > articulates, a reference implementation is required to implement all
> > > > elements from the specification, and therefore the major consequence
> of
> > > > labeling parquet-mr thusly would be that any specification change
> would
> > > > have to be implemented within parquet-mr as part of the
> standardisation
> > > > process. It would be insufficient for it to be implemented in, for
> > > > example, two of the parquet implementations maintained by the arrow
> > > > project. I personally think that would be a shame and likely exclude
> > > > many people who would otherwise be interested in evolving the parquet
> > > > specification, but think that is at the core of this question.
> > > >
> > > > Kind Regards,
> > > >
> > > > Raphael
> > > >
> > > > On 13/05/2024 20:55, Andrew Lamb wrote:
> > > > > Question: Should we label parquet-mr or any other parquet
> > > implementations
> > > > > "reference" implications"?
> > > > >
> > > > > This came up as part of Vinoo's great PR to list different parquet
> > > > > reference implementations[1][2].
> > > > >
> > > > > The term "reference implementation" often has an official
> connotation.
> > > > For
> > > > > example the wikipedia definition is "a program that implements all
> > > > > requirements from a corresponding specification. The reference
> > > > > implementation ... should be considered the "correct" behavior of
> any
> > > > other
> > > > > implementation of it."[3]
> > > > >
> > > > > Given the close association of parquet-mr to the parquet standard,
> it
> > > is
> > > > a
> > > > > natural candidate to label as "reference implementation." However,
> it
> > > is
> > > > > not clear to me if there is consensus that it should be thusly
> labeled.
> > > > >
> > > > > I have a strong opinion that a consensus on this question would be
> very
> > > > > helpful. I don't actually have a strong opinion about the answer
> > > > >
> > > > > Andrew
> > > > >
> > > > >
> > > > >
> > > > > [1]:
> > > >
> https://github.com/apache/parquet-site/pull/53#discussion_r1582882267
> > > > > [2]:
> > > >
> https://github.com/apache/parquet-site/pull/53#discussion_r1598283465
> > > > > [3]:  https://en.wikipedia.org/wiki/Reference_implementation
> > > > >
> > > >
> > >
> >
>
>
>
>

Reply via email to