I'm curious where the other arrow parquet implementations fit into this, if at all? For context, the original Rust implementation was largely the work of Chao Sun, who I believe to be a parquet PMC member, but it was then donated to the arrow project, and has primarily been developed and maintained by individuals affiliated with the arrow project since then, myself included. I'm not suggesting all parquet implementations necessarily need to be governed by the parquet PMC, but perhaps what ever compromise we devise for parquet-cpp might equally be applied to the other parquet projects that fall under the arrow umbrella?

Kind Regards,

Raphael

On 16/05/2024 13:26, Uwe L. Korn wrote:
I would actually consider someone who contributes to both communities at the 
same time to be a worthwhile addition to both projects. In my active years, we 
have mostly voted people into both projects; the order was not clear, though.

Being a committer/PMC means that you want to bring the community around a 
project forward in the Apache way (with parquet-cpp this is given as it is part 
of the parquet community and also still in a project that is residing within 
the Apache org).

he told me that the contribution to
parquet-cpp is no longer considered when promoting committers to
Apache Parquet PMC.
As a Parquet PMC, I would strongly object to that and would be supportive of 
also making them a Parquet committer/PMC.

Best
Uwe

On Thu, May 16, 2024, at 2:19 PM, Gang Wu wrote:
Hi,

I share the same feeling with Antoine that parquet-cpp seems to be fully
governed by Apache Arrow PMC, not the Apache Parquet PMC. I have
once discussed this with Xinli and he told me that the contribution to
parquet-cpp is no longer considered when promoting committers to
Apache Parquet PMC.

Best,
Gang

On Thu, May 16, 2024 at 4:29 PM Antoine Pitrou <anto...@python.org> wrote:

On Thu, 16 May 2024 10:08:42 +0200
"Uwe L. Korn" <uw...@xhochy.com> wrote:
On Tue, May 14, 2024, at 6:30 PM, Antoine Pitrou wrote:
AFAIK, the only Parquet implementation under the Apache Parquet project
is parquet-mr :-)
This is not true. The parquet-cpp that resides in the arrow repository
is still controlled by the Apache Parquet PMC. Back then, we only merged
the codebases but kept control of it with the Apache Parquet project. I
know, it is hard to understand, but at least I have never seen a vote that
would move it out of the Apache Parquet's project "control".

Ahah. Unfortunately, this doesn't match actual community practices. For
example, when it is decided to give (Arrow) commit rights to a frequent
Parquet C++ contributor, that decision is made among the Arrow PMC, not
the Parquet PMC.

Perhaps there would be value in aligning the legal situation on the
_de facto_ situation?

Regards

Antoine.


Best
Uwe

On Tue, 14 May 2024 10:58:58 +0200
Rok Mihevc <rok.mih...@gmail.com> wrote:
Second Raphael's point.
Would it be reasonable to say specification change requires
implementation
in two parquet implementations within Apache Parquet project?

Rok

On Tue, May 14, 2024 at 10:50 AM Gang Wu <
ustcwg-re5jqeeqqe8avxtiumw...@public.gmane.org> wrote:
IMHO, it looks more reasonable if a reference implementation is
required
to support most (not all) elements from the specification.

Another question is: should we discuss (and vote for) each candidate
one by one? We can start with parquet-mr which is most well-known
implementation.

Best,
Gang

On Tue, May 14, 2024 at 4:41 PM Raphael Taylor-Davies
<r.taylordav...@googlemail.com.invalid> wrote:

Potentially it would be helpful to flip the question around. As
Andrew
articulates, a reference implementation is required to implement
all
elements from the specification, and therefore the major
consequence of
labeling parquet-mr thusly would be that any specification change
would
have to be implemented within parquet-mr as part of the
standardisation
process. It would be insufficient for it to be implemented in, for
example, two of the parquet implementations maintained by the
arrow
project. I personally think that would be a shame and likely
exclude
many people who would otherwise be interested in evolving the
parquet
specification, but think that is at the core of this question.

Kind Regards,

Raphael

On 13/05/2024 20:55, Andrew Lamb wrote:
Question: Should we label parquet-mr or any other parquet
implementations
"reference" implications"?

This came up as part of Vinoo's great PR to list different
parquet
reference implementations[1][2].

The term "reference implementation" often has an official
connotation.
For
example the wikipedia definition is "a program that implements
all
requirements from a corresponding specification. The reference
implementation ... should be considered the "correct" behavior
of any
other
implementation of it."[3]

Given the close association of parquet-mr to the parquet
standard, it
is
a
natural candidate to label as "reference implementation."
However, it
is
not clear to me if there is consensus that it should be thusly
labeled.
I have a strong opinion that a consensus on this question would
be very
helpful. I don't actually have a strong opinion about the answer

Andrew



[1]:
https://github.com/apache/parquet-site/pull/53#discussion_r1582882267
[2]:
https://github.com/apache/parquet-site/pull/53#discussion_r1598283465
[3]:  https://en.wikipedia.org/wiki/Reference_implementation




Reply via email to