Day to day, I think having Parquet-cpp under the Apache Arrow project could make sense. Though I worry about two risks:
1. Would that lead to the governance of the format itself to be primarily the responsibility of the developers of Parquet-MR? 2. Would C++ developers interested in working with Parquet outside of Arrow recognize it as a relevant library? On Thu, Feb 2, 2023 at 6:03 AM Neal Richardson <neal.p.richard...@gmail.com> wrote: > Would it make sense to transfer all governance of the parquet-cpp > implementation to Apache Arrow? It seems like that's where we de facto are > already, so that would resolve these ambiguities and put it in line with > the Rust implementation. > > Would the Parquet PMC be opposed to formalizing this change? > > Neal > > On Thu, Feb 2, 2023 at 6:48 AM Raphael Taylor-Davies > <r.taylordav...@googlemail.com.invalid> wrote: > > > Hi, > > > > > Does the parquet rust implementation have a similar issue? > > > > Similar to the C++ implementation, the Rust implementation lives under > > the Apache Arrow umbrella and does not have any direct affiliation with > > the Apache Parquet project that I am aware of, beyond using the same > > format specification. However, as almost all of the users and > > contributions are with respect to the arrow interfaces, and not the > > parquet record APIs, there perhaps isn't the same ambiguity as > > encountered with the C++ implementation. I would expect all issues to be > > raised in the arrow-rs repository, and a PARQUET Jira only raised, > > likely by myself or whoever is triaging the issue, if there is some > > issue/ambiguity pertaining to the format itself. > > > > Kind Regards, > > > > Raphael > > > > On 02/02/2023 01:58, Gang Wu wrote: > > > Hi Will, > > > > > > AFAIK, the Apache Parquet community no longer considers contribution to > > > parquet-cpp when promoting new committers after the donation to Apache > > > Arrow. > > > > > > It would be a dilemma for the parquet-cpp contributors if none of the > > > Apache Arrow community or Apache Parquet community recognizes their > work. > > > > > > Does the parquet rust implementation have a similar issue? > > > > > > Best, > > > Gang > > > > > > On Thu, Feb 2, 2023 at 3:27 AM Will Jones <will.jones...@gmail.com> > > wrote: > > > > > >> Hello, > > >> > > >> A while back, the Parquet C++ implementation was merged into the > Apache > > >> Arrow monorepo [1]. As I understand it, this helped the development > > process > > >> immensely. However, I am noticing some governance issues because of > it. > > >> > > >> First, it's not obvious where issues are supposed to be open: In > Parquet > > >> Jira or Arrow GitHub issues. Looking back at some of the original > > >> discussion, it looks like the intention was > > >> > > >> * use PARQUET-XXX for issues relating to Parquet core > > >>> * use ARROW-XXX for issues relation to Arrow's consumption of Parquet > > >>> core (e.g. changes that are in parquet/arrow right now) > > >>> > > >> The README for the old parquet-cpp repo [3] states instead in it's > > >> migration note: > > >> > > >> JIRA issues should continue to be opened in the PARQUET JIRA > project. > > >> > > >> > > >> Either way, it doesn't seem like this process is obvious to people. > > Perhaps > > >> we could clarify this and add notices to Arrow's GitHub issues > template? > > >> > > >> Second, committer status is a little unclear. I am a committer on > Arrow, > > >> but not on Parquet right now. Does that mean I should only merge > Parquet > > >> C++ PRs for code changes in parquet/arrow? Or that I shouldn't merge > > >> Parquet changes at all? > > >> > > >> Also, are the contributions to Arrow C++ Parquet being actively > reviewed > > >> for potential new committers? > > >> > > >> Best, > > >> > > >> Will Jones > > >> > > >> [1] https://lists.apache.org/thread/76wzx2lsbwjl363bg066g8kdsocd03rw > > >> [2] https://lists.apache.org/thread/dkh6vjomcfyjlvoy83qdk9j5jgxk7n4j > > >> [3] https://github.com/apache/parquet-cpp > > >> > > >