Day to day, I think having Parquet-cpp under the Apache Arrow project could
make sense. Though I worry about two risks:

1. Would that lead to the governance of the format itself to be primarily
the responsibility of the developers of Parquet-MR?
2. Would C++ developers interested in working with Parquet outside of Arrow
recognize it as a relevant library?

On Thu, Feb 2, 2023 at 6:03 AM Neal Richardson <neal.p.richard...@gmail.com>
wrote:

> Would it make sense to transfer all governance of the parquet-cpp
> implementation to Apache Arrow? It seems like that's where we de facto are
> already, so that would resolve these ambiguities and put it in line with
> the Rust implementation.
>
> Would the Parquet PMC be opposed to formalizing this change?
>
> Neal
>
> On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies
> <r.taylordav...@googlemail.com.invalid> wrote:
>
> > Hi,
> >
> > > Does the parquet rust implementation have a similar issue?
> >
> > Similar to the C++ implementation, the Rust implementation lives under
> > the Apache Arrow umbrella and does not have any direct affiliation with
> > the Apache Parquet project that I am aware of, beyond using the same
> > format specification. However, as almost all of the users and
> > contributions are with respect to the arrow interfaces, and not the
> > parquet record APIs, there perhaps isn't the same ambiguity as
> > encountered with the C++ implementation. I would expect all issues to be
> > raised in the arrow-rs repository, and a PARQUET Jira only raised,
> > likely by myself or whoever is triaging the issue, if there is some
> > issue/ambiguity pertaining to the format itself.
> >
> > Kind Regards,
> >
> > Raphael
> >
> > On 02/02/2023 01:58, Gang Wu wrote:
> > > Hi Will,
> > >
> > > AFAIK, the Apache Parquet community no longer considers contribution to
> > > parquet-cpp when promoting new committers after the donation to Apache
> > > Arrow.
> > >
> > > It would be a dilemma for the parquet-cpp contributors if none of the
> > > Apache Arrow community or Apache Parquet community recognizes their
> work.
> > >
> > > Does the parquet rust implementation have a similar issue?
> > >
> > > Best,
> > > Gang
> > >
> > > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <will.jones...@gmail.com>
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> A while back, the Parquet C++ implementation was merged into the
> Apache
> > >> Arrow monorepo [1]. As I understand it, this helped the development
> > process
> > >> immensely. However, I am noticing some governance issues because of
> it.
> > >>
> > >> First, it's not obvious where issues are supposed to be open: In
> Parquet
> > >> Jira or Arrow GitHub issues. Looking back at some of the original
> > >> discussion, it looks like the intention was
> > >>
> > >> * use PARQUET-XXX for issues relating to Parquet core
> > >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet
> > >>> core (e.g. changes that are in parquet/arrow right now)
> > >>>
> > >> The README for the old parquet-cpp repo [3] states instead in it's
> > >> migration note:
> > >>
> > >>   JIRA issues should continue to be opened in the PARQUET JIRA
> project.
> > >>
> > >>
> > >> Either way, it doesn't seem like this process is obvious to people.
> > Perhaps
> > >> we could clarify this and add notices to Arrow's GitHub issues
> template?
> > >>
> > >> Second, committer status is a little unclear. I am a committer on
> Arrow,
> > >> but not on Parquet right now. Does that mean I should only merge
> Parquet
> > >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge
> > >> Parquet changes at all?
> > >>
> > >> Also, are the contributions to Arrow C++ Parquet being actively
> reviewed
> > >> for potential new committers?
> > >>
> > >> Best,
> > >>
> > >> Will Jones
> > >>
> > >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw
> > >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j
> > >> [3] https://github.com/apache/parquet-cpp
> > >>
> >
>

Reply via email to