After a lot of time beating my head against Windows toolchain issues
(I now know a _lot_ about this topic!) I have a green build at

https://github.com/apache/arrow/pull/2453

I'd like to merge this before much more time passes (i.e. today if
possible) and work on getting the outstanding patches migrated.

The only code that isn't a straight-copy is

https://github.com/apache/arrow/pull/2453/commits/fe5d435c9c58af42df4a37e7c97e37f33ae1857d

This contains all the modifications to the build system and CI to get
things fully working.

I will have to rebase (preserving the author and committer for each
patch) and then merge --ff-only to get this in

- Wes
On Tue, Sep 4, 2018 at 2:22 PM Wes McKinney <[email protected]> wrote:
>
> Great. It is definitely going to require some follow up patches to fix
> up the various packaging tasks, but at least the Linux Python wheels
> will still be working to start
> On Tue, Sep 4, 2018 at 2:04 PM Uwe L. Korn <[email protected]> wrote:
> >
> > Hello Wes,
> >
> > I have not much time this week but I hope to squeeze in some minutes 
> > tomorrow afternoon to review the code. As this is a very big merge, I want 
> > to be extra careful to not break anything really badly. Hopefully more eyes 
> > will help.
> >
> > Thank you for all the work in pushing this forward in the last days!
> >
> > Uwe
> >
> > On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> > > Dear all,
> > >
> > > The repo merge is nearly ready to go modulo some fixes to CI. There
> > > will be a number of follow up issues to re-establish the various
> > > (untested) build procedures in parquet-cpp
> > >
> > > https://github.com/apache/arrow/pull/2453
> > >
> > > I would like to merge this by EOD Wednesday 9/5, or Thursday at
> > > latest, so we can get the patches from apache/parquet-cpp moved over
> > > and avoid any disruption to development process. If there are any
> > > comments please let me know
> > >
> > > - Wes
> > > On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <[email protected]> wrote:
> > > >
> > > > hi all,
> > > >
> > > > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > > > Arrow about how to specifically proceed
> > > >
> > > > I have already done the preparatory work to undertake the merge
> > > >
> > > > https://github.com/apache/arrow/pull/2453
> > > >
> > > > thanks
> > > > Wes
> > > >
> > > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <[email protected]> 
> > > > wrote:
> > > > > Yes, feel free to have a look at
> > > > >
> > > > > https://github.com/apache/arrow/pull/2453
> > > > >
> > > > > I'm not very in favor of having a commingled non-linear history that
> > > > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > > > >
> > > > > Here's an example from Apache Spark where a similar merge took place
> > > > >
> > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > > > >
> > > > > It would be my preference to have a single squashed commit whose
> > > > > message attributes the developers of the code and provides links back
> > > > > to the original commit history in the commit message
> > > > >
> > > > > - Wes
> > > > >
> > > > >
> > > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <[email protected]> wrote:
> > > > >> I have a very strong preference to keep the git history. I will have 
> > > > >> a look tomorrow to find the correct git magic to get a linear 
> > > > >> history. For me a single merge commit would be ok but I'm fine to 
> > > > >> spend an additional hour on this if you care strongly about linear 
> > > > >> history.
> > > > >>
> > > > >> Uwe
> > > > >>
> > > > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > > > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > > > >>> nonlinear git history (and rebasing is not really an option) but we
> > > > >>> can discuss that more later
> > > > >>>
> > > > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <[email protected]> 
> > > > >>> wrote:
> > > > >>> > +1 on this but also see my comments in the mail on the 
> > > > >>> > discussions.
> > > > >>> >
> > > > >>> > We should also keep the git history of parquet-cpp, that should 
> > > > >>> > not be hard with git and there is probably a StackOverflow answer 
> > > > >>> > out there that gives you the commands to do the merge.
> > > > >>> >
> > > > >>> > Uwe
> > > > >>> >
> > > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > > > >>> >> In case any are interested: my estimate of the work involved in 
> > > > >>> >> the
> > > > >>> >> migration to be about a full day of total work, possibly less. 
> > > > >>> >> As soon
> > > > >>> >> as the migration plan is decided upon I intend to execute ASAP 
> > > > >>> >> so that
> > > > >>> >> ongoing development efforts are not disrupted.
> > > > >>> >>
> > > > >>> >> Additionally, in flight patches do not all need to be merged. 
> > > > >>> >> Patches
> > > > >>> >> can be easily edited to apply against the modified repository
> > > > >>> >> structure
> > > > >>> >>
> > > > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney 
> > > > >>> >> <[email protected]> wrote:
> > > > >>> >> > hi all,
> > > > >>> >> >
> > > > >>> >> > As discussed on the mailing list [1] I am proposing to 
> > > > >>> >> > undertake a
> > > > >>> >> > restructuring of the development process for parquet-cpp and 
> > > > >>> >> > its
> > > > >>> >> > consumption in the Arrow ecosystem to benefit the developers 
> > > > >>> >> > and users
> > > > >>> >> > of both communities.
> > > > >>> >> >
> > > > >>> >> > The specific actions we would take would be:
> > > > >>> >> >
> > > > >>> >> > 1) Move the source code currently located at src/ in the
> > > > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory 
> > > > >>> >> > located in
> > > > >>> >> > apache/arrow [3]
> > > > >>> >> >
> > > > >>> >> > 2) The parquet code tree would remain separate from the Arrow 
> > > > >>> >> > code
> > > > >>> >> > tree, though the two projects will continue to share code as 
> > > > >>> >> > they do
> > > > >>> >> > now
> > > > >>> >> >
> > > > >>> >> > 3) The build system in apache/parquet-cpp would be effectively
> > > > >>> >> > deprecated and can be mostly discarded, as it is largely 
> > > > >>> >> > redundant and
> > > > >>> >> > duplicated from the build system in apache/arrow
> > > > >>> >> >
> > > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to 
> > > > >>> >> > provide
> > > > >>> >> > development workflows to enable contributors working 
> > > > >>> >> > exclusively on
> > > > >>> >> > the Parquet core functionality to be able to work unencumbered 
> > > > >>> >> > with
> > > > >>> >> > unnecessary build or test dependencies from the rest of the 
> > > > >>> >> > Arrow
> > > > >>> >> > codebase. Note that parquet-cpp already builds a significant 
> > > > >>> >> > portion
> > > > >>> >> > of Apache Arrow en route to creating its libraries
> > > > >>> >> >
> > > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet 
> > > > >>> >> > C++
> > > > >>> >> > releases by packaging up the appropriate components and 
> > > > >>> >> > ensuring that
> > > > >>> >> > they can be built and installed independently as now
> > > > >>> >> >
> > > > >>> >> > 6) The CI processes would be merged -- since we already build 
> > > > >>> >> > the
> > > > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
> > > > >>> >> > building the Parquet unit tests and running them.
> > > > >>> >> >
> > > > >>> >> > 7) Patches contributed that do not involve Arrow-related 
> > > > >>> >> > functionality
> > > > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX 
> > > > >>> >> > patches may
> > > > >>> >> > span both codebases
> > > > >>> >> >
> > > > >>> >> > 8) Parquet C++ committers can be given push rights on 
> > > > >>> >> > apache/arrow
> > > > >>> >> > subject to ongoing good citizenry (e.g. not merging patches 
> > > > >>> >> > that break
> > > > >>> >> > builds). The Arrow PMC may need to vote on the procedure for 
> > > > >>> >> > offering
> > > > >>> >> > pass-through commit rights to anyone who has been invited to 
> > > > >>> >> > be a
> > > > >>> >> > committer for Apache Parquet
> > > > >>> >> >
> > > > >>> >> > 9) The contributors who work on both Arrow and Parquet will 
> > > > >>> >> > work in
> > > > >>> >> > good faith to ensure that that needs of Parquet-only 
> > > > >>> >> > developers (i.e.
> > > > >>> >> > who consume Parquet files in some way unrelated to the Arrow 
> > > > >>> >> > columnar
> > > > >>> >> > standard) are accommodated
> > > > >>> >> >
> > > > >>> >> > There are a number of particular details we will need to 
> > > > >>> >> > discuss
> > > > >>> >> > further (such as the specific logistics of the codebase 
> > > > >>> >> > surgery; e.g.
> > > > >>> >> > how to manage the commit history in apache/parquet-cpp -- do 
> > > > >>> >> > we care
> > > > >>> >> > about git blame?)
> > > > >>> >> >
> > > > >>> >> > This vote is to determine if the Parquet PMC is in favor of 
> > > > >>> >> > working in
> > > > >>> >> > good faith to execute on the above plan. I will inquire with 
> > > > >>> >> > the Arrow
> > > > >>> >> > PMC to see if we need to have a corresponding vote there, and 
> > > > >>> >> > also how
> > > > >>> >> > to handle the management of commit rights.
> > > > >>> >> >
> > > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > > > >>> >> > [ ] +0: . . .
> > > > >>> >> > [ ] -1: Not in favor because . . .
> > > > >>> >> >
> > > > >>> >> > Here is my vote: +1.
> > > > >>> >> >
> > > > >>> >> > Thank you,
> > > > >>> >> > Wes
> > > > >>> >> >
> > > > >>> >> > [1]: 
> > > > >>> >> > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > > > >>> >> > [2]: 
> > > > >>> >> > https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > > > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Reply via email to