Great. It is definitely going to require some follow up patches to fix
up the various packaging tasks, but at least the Linux Python wheels
will still be working to start
On Tue, Sep 4, 2018 at 2:04 PM Uwe L. Korn <[email protected]> wrote:
>
> Hello Wes,
>
> I have not much time this week but I hope to squeeze in some minutes tomorrow
> afternoon to review the code. As this is a very big merge, I want to be extra
> careful to not break anything really badly. Hopefully more eyes will help.
>
> Thank you for all the work in pushing this forward in the last days!
>
> Uwe
>
> On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> > Dear all,
> >
> > The repo merge is nearly ready to go modulo some fixes to CI. There
> > will be a number of follow up issues to re-establish the various
> > (untested) build procedures in parquet-cpp
> >
> > https://github.com/apache/arrow/pull/2453
> >
> > I would like to merge this by EOD Wednesday 9/5, or Thursday at
> > latest, so we can get the patches from apache/parquet-cpp moved over
> > and avoid any disruption to development process. If there are any
> > comments please let me know
> >
> > - Wes
> > On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <[email protected]> wrote:
> > >
> > > hi all,
> > >
> > > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > > Arrow about how to specifically proceed
> > >
> > > I have already done the preparatory work to undertake the merge
> > >
> > > https://github.com/apache/arrow/pull/2453
> > >
> > > thanks
> > > Wes
> > >
> > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <[email protected]>
> > > wrote:
> > > > Yes, feel free to have a look at
> > > >
> > > > https://github.com/apache/arrow/pull/2453
> > > >
> > > > I'm not very in favor of having a commingled non-linear history that
> > > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > > >
> > > > Here's an example from Apache Spark where a similar merge took place
> > > >
> > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > > >
> > > > It would be my preference to have a single squashed commit whose
> > > > message attributes the developers of the code and provides links back
> > > > to the original commit history in the commit message
> > > >
> > > > - Wes
> > > >
> > > >
> > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <[email protected]> wrote:
> > > >> I have a very strong preference to keep the git history. I will have a
> > > >> look tomorrow to find the correct git magic to get a linear history.
> > > >> For me a single merge commit would be ok but I'm fine to spend an
> > > >> additional hour on this if you care strongly about linear history.
> > > >>
> > > >> Uwe
> > > >>
> > > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > > >>> nonlinear git history (and rebasing is not really an option) but we
> > > >>> can discuss that more later
> > > >>>
> > > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <[email protected]> wrote:
> > > >>> > +1 on this but also see my comments in the mail on the discussions.
> > > >>> >
> > > >>> > We should also keep the git history of parquet-cpp, that should not
> > > >>> > be hard with git and there is probably a StackOverflow answer out
> > > >>> > there that gives you the commands to do the merge.
> > > >>> >
> > > >>> > Uwe
> > > >>> >
> > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > > >>> >> In case any are interested: my estimate of the work involved in the
> > > >>> >> migration to be about a full day of total work, possibly less. As
> > > >>> >> soon
> > > >>> >> as the migration plan is decided upon I intend to execute ASAP so
> > > >>> >> that
> > > >>> >> ongoing development efforts are not disrupted.
> > > >>> >>
> > > >>> >> Additionally, in flight patches do not all need to be merged.
> > > >>> >> Patches
> > > >>> >> can be easily edited to apply against the modified repository
> > > >>> >> structure
> > > >>> >>
> > > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney
> > > >>> >> <[email protected]> wrote:
> > > >>> >> > hi all,
> > > >>> >> >
> > > >>> >> > As discussed on the mailing list [1] I am proposing to undertake
> > > >>> >> > a
> > > >>> >> > restructuring of the development process for parquet-cpp and its
> > > >>> >> > consumption in the Arrow ecosystem to benefit the developers and
> > > >>> >> > users
> > > >>> >> > of both communities.
> > > >>> >> >
> > > >>> >> > The specific actions we would take would be:
> > > >>> >> >
> > > >>> >> > 1) Move the source code currently located at src/ in the
> > > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory
> > > >>> >> > located in
> > > >>> >> > apache/arrow [3]
> > > >>> >> >
> > > >>> >> > 2) The parquet code tree would remain separate from the Arrow
> > > >>> >> > code
> > > >>> >> > tree, though the two projects will continue to share code as
> > > >>> >> > they do
> > > >>> >> > now
> > > >>> >> >
> > > >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> > > >>> >> > deprecated and can be mostly discarded, as it is largely
> > > >>> >> > redundant and
> > > >>> >> > duplicated from the build system in apache/arrow
> > > >>> >> >
> > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to
> > > >>> >> > provide
> > > >>> >> > development workflows to enable contributors working exclusively
> > > >>> >> > on
> > > >>> >> > the Parquet core functionality to be able to work unencumbered
> > > >>> >> > with
> > > >>> >> > unnecessary build or test dependencies from the rest of the Arrow
> > > >>> >> > codebase. Note that parquet-cpp already builds a significant
> > > >>> >> > portion
> > > >>> >> > of Apache Arrow en route to creating its libraries
> > > >>> >> >
> > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
> > > >>> >> > releases by packaging up the appropriate components and ensuring
> > > >>> >> > that
> > > >>> >> > they can be built and installed independently as now
> > > >>> >> >
> > > >>> >> > 6) The CI processes would be merged -- since we already build the
> > > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> > > >>> >> > building the Parquet unit tests and running them.
> > > >>> >> >
> > > >>> >> > 7) Patches contributed that do not involve Arrow-related
> > > >>> >> > functionality
> > > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX
> > > >>> >> > patches may
> > > >>> >> > span both codebases
> > > >>> >> >
> > > >>> >> > 8) Parquet C++ committers can be given push rights on
> > > >>> >> > apache/arrow
> > > >>> >> > subject to ongoing good citizenry (e.g. not merging patches that
> > > >>> >> > break
> > > >>> >> > builds). The Arrow PMC may need to vote on the procedure for
> > > >>> >> > offering
> > > >>> >> > pass-through commit rights to anyone who has been invited to be a
> > > >>> >> > committer for Apache Parquet
> > > >>> >> >
> > > >>> >> > 9) The contributors who work on both Arrow and Parquet will work
> > > >>> >> > in
> > > >>> >> > good faith to ensure that that needs of Parquet-only developers
> > > >>> >> > (i.e.
> > > >>> >> > who consume Parquet files in some way unrelated to the Arrow
> > > >>> >> > columnar
> > > >>> >> > standard) are accommodated
> > > >>> >> >
> > > >>> >> > There are a number of particular details we will need to discuss
> > > >>> >> > further (such as the specific logistics of the codebase surgery;
> > > >>> >> > e.g.
> > > >>> >> > how to manage the commit history in apache/parquet-cpp -- do we
> > > >>> >> > care
> > > >>> >> > about git blame?)
> > > >>> >> >
> > > >>> >> > This vote is to determine if the Parquet PMC is in favor of
> > > >>> >> > working in
> > > >>> >> > good faith to execute on the above plan. I will inquire with the
> > > >>> >> > Arrow
> > > >>> >> > PMC to see if we need to have a corresponding vote there, and
> > > >>> >> > also how
> > > >>> >> > to handle the management of commit rights.
> > > >>> >> >
> > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > > >>> >> > [ ] +0: . . .
> > > >>> >> > [ ] -1: Not in favor because . . .
> > > >>> >> >
> > > >>> >> > Here is my vote: +1.
> > > >>> >> >
> > > >>> >> > Thank you,
> > > >>> >> > Wes
> > > >>> >> >
> > > >>> >> > [1]:
> > > >>> >> > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > > >>> >> > [2]:
> > > >>> >> > https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src