IMO, moving parquet-cpp out of arrow is challenging as the dependency
chain looks like: arrow core <- parquet-cpp <- arrow dataset <- pyarrow

Best,
Gang

On Tue, May 14, 2024 at 12:38 PM Julien Le Dem <jul...@apache.org> wrote:

> It is great to see more momentum building.
> I have myself a little bit more time to contribute to Parquet.
>
> Personally I think moving it back would make sense.
> *However* I have personally contributed a lot more to the Java than the C++
> code base.
> That move was done initially because people contributing to the Arrow and
> Parquet C++ code bases were the same ones and circular dependencies were
> getting in the way (does Parquet depend on Arrow or the other way around?
> At the time it was both ways.). So to make this happen, we need enough
> Parquet C++ contributors that would be happy with the move and clarify
> which way the dependency goes. My take is that Parquet depends on Arrow but
> I'd be curious to see what others think.
> Julien
>
> On Sat, May 11, 2024 at 2:51 AM Andrew Lamb <andrewlam...@gmail.com>
> wrote:
>
> > It is great to see some additional enthusiasm and momentum around the
> > Apache Parquet implementation (congratulations on the release of
> parquet-mr
> > 1.14[1]!).
> >
> > As activity picks up, if the desire is to build more community around
> > Parquet, perhaps the Parquet PMC wants to encourage moving code back to
> > repositories managed by parquet (and out of arrow, for example). I
> realize
> > this would be a technical burden, but it might clarify communities and
> > committers.
> >
> > Andrew
> >
> > [1]: https://lists.apache.org/thread/2gggm938z0x9fx3wtwctfm5htsxlf3z4
> >
> >
> >
> > On Fri, May 10, 2024 at 11:45 PM Matt Topol <zotthewiz...@gmail.com>
> > wrote:
> >
> > > I just wanted to also poke the question of non-Java developers who have
> > > worked on the other parquet implementations potentially being
> recognized
> > as
> > > committers or otherwise on the Parquet project (speaking as the primary
> > > developer of the Go parquet implementation which also lives in the
> Arrow
> > > repository). It would be great to see some active contributors to
> > > parquet-cpp, parquet-go, or otherwise not just being considered but
> > > actively becoming committers.
> > >
> > > That's just my two cents from a community perspective.
> > >
> > > --Matt
> > >
> > > On Fri, May 10, 2024, 10:35 PM Jacob Wujciak <assignu...@apache.org>
> > > wrote:
> > >
> > > > Thank you, that sounds great! On first glance some seem to be rather
> > old
> > > > and probably don't apply anymore.
> > > >
> > > > > BTW, do we really need to make a full copy of them to have a mirror
> > in
> > > > the Arrow GitHub issues?
> > > >
> > > > I am not sure I understand what you mean? A full copy of the
> > > > open/closed/all issues? I'd say only the (remaining) open issues
> would
> > be
> > > > fine.
> > > > For reference this is what an imported issue looks like:
> > > > https://github.com/apache/arrow/issues/30543
> > > >
> > > > Am Sa., 11. Mai 2024 um 04:09 Uhr schrieb Gang Wu <ust...@gmail.com
> >:
> > > >
> > > > > I can initiate the vote. But before the vote, I think we need to
> > > revisit
> > > > > the states of all unresolved tickets and close some as needed.
> > > > >
> > > > > BTW, do we really need to make a full copy of them to have a mirror
> > > > > in the Arrow GitHub issues?
> > > > >
> > > > > I'd like to seek a consensus here before sending the vote.
> > > > >
> > > > > Best,
> > > > > Gang
> > > > >
> > > > > On Sat, May 11, 2024 at 8:46 AM Jacob Wujciak <
> assignu...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > Hello Everyone!
> > > > > >
> > > > > > It seems there is general agreement on this topic, it would be
> > great
> > > > if a
> > > > > > committer/PMC could start a (lazy consensus) procedural vote.
> > > > > >
> > > > > > I will inquire how to handle the parquet-cpp component in jira
> > > (ideally
> > > > > > disabling it, not removing).
> > > > > > There are currently only ~70 open tickets for parquet-cpp, with
> the
> > > > > change
> > > > > > in repo it is probably easier to just move open tickets but I'll
> > > leave
> > > > > that
> > > > > > to Rok who managed the transition of Arrows 20k+ tickets too :D
> > > > > >
> > > > > > Thanks,
> > > > > > Jacob
> > > > > >
> > > > > > Arrow committer
> > > > > >
> > > > > > On 2024/04/25 05:31:18 Gang Wu wrote:
> > > > > > > I know we have some non-Java committers and PMCs. But after the
> > > > > > parquet-cpp
> > > > > > > donation, it seems that no one worked on Parquet from arrow
> (cpp,
> > > > rust,
> > > > > > go,
> > > > > > > etc.)
> > > > > > > and other projects are promoted as a Parquet committer. It
> would
> > be
> > > > > > > inconvenient
> > > > > > > for non-Java Parquet developers to work with
> > apache/parquet-format
> > > > and
> > > > > > > apache/parquet-testing repositories. Furthermore, votes from
> > these
> > > > > > > developers
> > > > > > > are not binding for a format change in the ML.
> > > > > > >
> > > > > > > Best,
> > > > > > > Gang
> > > > > > >
> > > > > > > On Wed, Apr 24, 2024 at 8:42 PM Uwe L. Korn <uw...@xhochy.com>
> > > > wrote:
> > > > > > >
> > > > > > > > > Should we consider
> > > > > > > > > Parquet developers from other projects than parquet-mr as
> > > Parquet
> > > > > > > > commuters?
> > > > > > > >
> > > > > > > > We are doing this (speaking as a Parquet PMC who didn't work
> on
> > > > > > > > parquet-mr, but parquet-cpp).
> > > > > > > >
> > > > > > > > Best
> > > > > > > > Uwe
> > > > > > > >
> > > > > > > > On Wed, Apr 24, 2024, at 2:38 PM, Gang Wu wrote:
> > > > > > > > > +1 for moving parquet-cpp issues from Apache Jira to
> Arrow's
> > > > GitHub
> > > > > > > > issue.
> > > > > > > > >
> > > > > > > > > Besides, I want to echo Will's question in the thread.
> Should
> > > we
> > > > > > consider
> > > > > > > > > Parquet developers from other projects than parquet-mr as
> > > Parquet
> > > > > > > > commiters?
> > > > > > > > > Currently apache/parquet-format and apache/parquet-testing
> > > > > > repositories
> > > > > > > > are
> > > > > > > > > solely governed by Apache Parquet PMC. It would be better
> for
> > > the
> > > > > > entire
> > > > > > > > > Parquet community if developers with sufficient
> contributions
> > > to
> > > > > open
> > > > > > > > source
> > > > > > > > > Parquet projects (including but not limited to parquet-cpp,
> > > > > arrow-rs,
> > > > > > > > cudf,
> > > > > > > > > etc.)
> > > > > > > > > can be considered as Parquet committer and PMC.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Gang
> > > > > > > > >
> > > > > > > > > On Wed, Apr 24, 2024 at 7:04 PM Uwe L. Korn <
> > uw...@xhochy.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> I would be very supportive of this move. The Parquet C++
> > > > > > development has
> > > > > > > > >> been under the umbrella of the Arrow repository for more
> > than
> > > > > > five(six?)
> > > > > > > > >> years now. Thus, the issues should also be aligned with
> the
> > > > Arrow
> > > > > > > > project.
> > > > > > > > >>
> > > > > > > > >> Uwe
> > > > > > > > >>
> > > > > > > > >> On Tue, Apr 23, 2024, at 8:27 PM, Rok Mihevc wrote:
> > > > > > > > >> > Bumping this thread again to see if there is will to
> call
> > > for
> > > > a
> > > > > > vote
> > > > > > > > and
> > > > > > > > >> > move parquet-cpp issues from Apache Jira to Arrow's
> GitHub
> > > > issue
> > > > > > as
> > > > > > > > was
> > > > > > > > >> > done for Arrow.
> > > > > > > > >> > I'm willing to do the move as I already did it for
> Arrow.
> > > > > > > > >> >
> > > > > > > > >> > Rok
> > > > > > > > >> >
> > > > > > > > >> > On Sat, Apr 15, 2023 at 4:53 AM Micah Kornfield <
> > > > > > > > emkornfi...@apache.org>
> > > > > > > > >> > wrote:
> > > > > > > > >> >
> > > > > > > > >> >> Bumping this thread again to see in any Parquet PMC
> > members
> > > > can
> > > > > > chime
> > > > > > > > >> >> in/maybe start a formal vote to move governance of
> > > > Parquet-CPP
> > > > > > under
> > > > > > > > the
> > > > > > > > >> >> umbrella.
> > > > > > > > >> >>
> > > > > > > > >> >> -Micah
> > > > > > > > >> >>
> > > > > > > > >> >> On 2023/02/02 10:34:25 Antoine Pitrou wrote:
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >> > Hi Will,
> > > > > > > > >> >> >
> > > > > > > > >> >> > Le 01/02/2023 à 20:27, Will Jones a écrit :
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > First, it's not obvious where issues are supposed
> to
> > be
> > > > > > open: In
> > > > > > > > >> >> Parquet
> > > > > > > > >> >> > > Jira or Arrow GitHub issues. Looking back at some
> of
> > > the
> > > > > > original
> > > > > > > > >> >> > > discussion, it looks like the intention was
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > * use PARQUET-XXX for issues relating to Parquet
> core
> > > > > > > > >> >> > >> * use ARROW-XXX for issues relation to Arrow's
> > > > consumption
> > > > > > of
> > > > > > > > >> Parquet
> > > > > > > > >> >> > >> core (e.g. changes that are in parquet/arrow right
> > > now)
> > > > > > > > >> >> > >>
> > > > > > > > >> >> > > The README for the old parquet-cpp repo [3] states
> > > > instead
> > > > > in
> > > > > > > > it's
> > > > > > > > >> >> > > migration note:
> > > > > > > > >> >> > >
> > > > > > > > >> >> > >   JIRA issues should continue to be opened in the
> > > PARQUET
> > > > > > JIRA
> > > > > > > > >> project.
> > > > > > > > >> >> > >
> > > > > > > > >> >> > > Either way, it doesn't seem like this process is
> > > obvious
> > > > to
> > > > > > > > people.
> > > > > > > > >> >> Perhaps
> > > > > > > > >> >> > > we could clarify this and add notices to Arrow's
> > GitHub
> > > > > > issues
> > > > > > > > >> >> template?
> > > > > > > > >> >> >
> > > > > > > > >> >> > I agree we should clarify this. I have no personal
> > > > > preference,
> > > > > > but
> > > > > > > > I
> > > > > > > > >> >> will note
> > > > > > > > >> >> > that Github issues decrease friction as having a GH
> > > account
> > > > > is
> > > > > > > > already
> > > > > > > > >> >> necessary
> > > > > > > > >> >> > for submitting PRs.
> > > > > > > > >> >> >
> > > > > > > > >> >> > > Second, committer status is a little unclear. I am
> a
> > > > > > committer on
> > > > > > > > >> >> Arrow,
> > > > > > > > >> >> > > but not on Parquet right now. Does that mean I
> should
> > > > only
> > > > > > merge
> > > > > > > > >> >> Parquet
> > > > > > > > >> >> > > C++ PRs for code changes in parquet/arrow? Or that
> I
> > > > > > shouldn't
> > > > > > > > merge
> > > > > > > > >> >> > > Parquet changes at all?
> > > > > > > > >> >> >
> > > > > > > > >> >> > Since Parquet C++ is part of Arrow C++, you are
> allowed
> > > to
> > > > > > merge
> > > > > > > > >> Parquet
> > > > > > > > >> >> C++
> > > > > > > > >> >> > changes. As always you should ensure you have
> > sufficient
> > > > > > > > understanding
> > > > > > > > >> >> of the
> > > > > > > > >> >> > contribution, and that it follows established
> > practices:
> > > > > > > > >> >> >
> > > > https://arrow.apache.org/docs/dev/developers/reviewing.html
> > > > > > > > >> >> >
> > > > > > > > >> >> > > Also, are the contributions to Arrow C++ Parquet
> > being
> > > > > > actively
> > > > > > > > >> >> reviewed
> > > > > > > > >> >> > > for potential new committers?
> > > > > > > > >> >> >
> > > > > > > > >> >> > I would certainly do.
> > > > > > > > >> >> >
> > > > > > > > >> >> > Regards
> > > > > > > > >> >> >
> > > > > > > > >> >> > Antoine.
> > > > > > > > >> >> >
> > > > > > > > >> >> >
> > > > > > > > >> >>
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to