Dear all -- the merge has been completed, thank you! 318 patches
(after the filter-branch grafting procedure) were merged to
apache/arrow

We have some follow up work to do:

* Move patches from apache/parquet-cpp to apache/arrow
* Add CONTRIBUTING.md and note to README that patches are no longer
accepted at the old location
* Migrate CLI utiltiies and other small items that did not survive the
merge: tools/, benchmarks/, and examples/
* Develop new release procedure for Apache Parquet

On this third point, we can also import their git history if desired.
Incorporating them into the build will be comparatively easy to the
library integration.

There are already some JIRA issues open for some of these, but
anything else please create issues so we can keep track.

I'm already quite excited to get busy with some refactoring and
internals improvements that I had avoided because of the painful
development procedure.

Thanks,
Wes
On Fri, Sep 7, 2018 at 11:18 AM Wes McKinney <[email protected]> wrote:
>
> After a lot of time beating my head against Windows toolchain issues
> (I now know a _lot_ about this topic!) I have a green build at
>
> https://github.com/apache/arrow/pull/2453
>
> I'd like to merge this before much more time passes (i.e. today if
> possible) and work on getting the outstanding patches migrated.
>
> The only code that isn't a straight-copy is
>
> https://github.com/apache/arrow/pull/2453/commits/fe5d435c9c58af42df4a37e7c97e37f33ae1857d
>
> This contains all the modifications to the build system and CI to get
> things fully working.
>
> I will have to rebase (preserving the author and committer for each
> patch) and then merge --ff-only to get this in
>
> - Wes
> On Tue, Sep 4, 2018 at 2:22 PM Wes McKinney <[email protected]> wrote:
> >
> > Great. It is definitely going to require some follow up patches to fix
> > up the various packaging tasks, but at least the Linux Python wheels
> > will still be working to start
> > On Tue, Sep 4, 2018 at 2:04 PM Uwe L. Korn <[email protected]> wrote:
> > >
> > > Hello Wes,
> > >
> > > I have not much time this week but I hope to squeeze in some minutes 
> > > tomorrow afternoon to review the code. As this is a very big merge, I 
> > > want to be extra careful to not break anything really badly. Hopefully 
> > > more eyes will help.
> > >
> > > Thank you for all the work in pushing this forward in the last days!
> > >
> > > Uwe
> > >
> > > On Tue, Sep 4, 2018, at 6:27 PM, Wes McKinney wrote:
> > > > Dear all,
> > > >
> > > > The repo merge is nearly ready to go modulo some fixes to CI. There
> > > > will be a number of follow up issues to re-establish the various
> > > > (untested) build procedures in parquet-cpp
> > > >
> > > > https://github.com/apache/arrow/pull/2453
> > > >
> > > > I would like to merge this by EOD Wednesday 9/5, or Thursday at
> > > > latest, so we can get the patches from apache/parquet-cpp moved over
> > > > and avoid any disruption to development process. If there are any
> > > > comments please let me know
> > > >
> > > > - Wes
> > > > On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <[email protected]> 
> > > > wrote:
> > > > >
> > > > > hi all,
> > > > >
> > > > > with 3 binding +1 votes, the vote carries. We will discuss with Apache
> > > > > Arrow about how to specifically proceed
> > > > >
> > > > > I have already done the preparatory work to undertake the merge
> > > > >
> > > > > https://github.com/apache/arrow/pull/2453
> > > > >
> > > > > thanks
> > > > > Wes
> > > > >
> > > > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <[email protected]> 
> > > > > wrote:
> > > > > > Yes, feel free to have a look at
> > > > > >
> > > > > > https://github.com/apache/arrow/pull/2453
> > > > > >
> > > > > > I'm not very in favor of having a commingled non-linear history that
> > > > > > makes git bisect difficult. We will have to discuss on the Arrow ML
> > > > > >
> > > > > > Here's an example from Apache Spark where a similar merge took place
> > > > > >
> > > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
> > > > > >
> > > > > > It would be my preference to have a single squashed commit whose
> > > > > > message attributes the developers of the code and provides links 
> > > > > > back
> > > > > > to the original commit history in the commit message
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <[email protected]> 
> > > > > > wrote:
> > > > > >> I have a very strong preference to keep the git history. I will 
> > > > > >> have a look tomorrow to find the correct git magic to get a linear 
> > > > > >> history. For me a single merge commit would be ok but I'm fine to 
> > > > > >> spend an additional hour on this if you care strongly about linear 
> > > > > >> history.
> > > > > >>
> > > > > >> Uwe
> > > > > >>
> > > > > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
> > > > > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a
> > > > > >>> nonlinear git history (and rebasing is not really an option) but 
> > > > > >>> we
> > > > > >>> can discuss that more later
> > > > > >>>
> > > > > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <[email protected]> 
> > > > > >>> wrote:
> > > > > >>> > +1 on this but also see my comments in the mail on the 
> > > > > >>> > discussions.
> > > > > >>> >
> > > > > >>> > We should also keep the git history of parquet-cpp, that should 
> > > > > >>> > not be hard with git and there is probably a StackOverflow 
> > > > > >>> > answer out there that gives you the commands to do the merge.
> > > > > >>> >
> > > > > >>> > Uwe
> > > > > >>> >
> > > > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
> > > > > >>> >> In case any are interested: my estimate of the work involved 
> > > > > >>> >> in the
> > > > > >>> >> migration to be about a full day of total work, possibly less. 
> > > > > >>> >> As soon
> > > > > >>> >> as the migration plan is decided upon I intend to execute ASAP 
> > > > > >>> >> so that
> > > > > >>> >> ongoing development efforts are not disrupted.
> > > > > >>> >>
> > > > > >>> >> Additionally, in flight patches do not all need to be merged. 
> > > > > >>> >> Patches
> > > > > >>> >> can be easily edited to apply against the modified repository
> > > > > >>> >> structure
> > > > > >>> >>
> > > > > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney 
> > > > > >>> >> <[email protected]> wrote:
> > > > > >>> >> > hi all,
> > > > > >>> >> >
> > > > > >>> >> > As discussed on the mailing list [1] I am proposing to 
> > > > > >>> >> > undertake a
> > > > > >>> >> > restructuring of the development process for parquet-cpp and 
> > > > > >>> >> > its
> > > > > >>> >> > consumption in the Arrow ecosystem to benefit the developers 
> > > > > >>> >> > and users
> > > > > >>> >> > of both communities.
> > > > > >>> >> >
> > > > > >>> >> > The specific actions we would take would be:
> > > > > >>> >> >
> > > > > >>> >> > 1) Move the source code currently located at src/ in the
> > > > > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory 
> > > > > >>> >> > located in
> > > > > >>> >> > apache/arrow [3]
> > > > > >>> >> >
> > > > > >>> >> > 2) The parquet code tree would remain separate from the 
> > > > > >>> >> > Arrow code
> > > > > >>> >> > tree, though the two projects will continue to share code as 
> > > > > >>> >> > they do
> > > > > >>> >> > now
> > > > > >>> >> >
> > > > > >>> >> > 3) The build system in apache/parquet-cpp would be 
> > > > > >>> >> > effectively
> > > > > >>> >> > deprecated and can be mostly discarded, as it is largely 
> > > > > >>> >> > redundant and
> > > > > >>> >> > duplicated from the build system in apache/arrow
> > > > > >>> >> >
> > > > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to 
> > > > > >>> >> > provide
> > > > > >>> >> > development workflows to enable contributors working 
> > > > > >>> >> > exclusively on
> > > > > >>> >> > the Parquet core functionality to be able to work 
> > > > > >>> >> > unencumbered with
> > > > > >>> >> > unnecessary build or test dependencies from the rest of the 
> > > > > >>> >> > Arrow
> > > > > >>> >> > codebase. Note that parquet-cpp already builds a significant 
> > > > > >>> >> > portion
> > > > > >>> >> > of Apache Arrow en route to creating its libraries
> > > > > >>> >> >
> > > > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet 
> > > > > >>> >> > C++
> > > > > >>> >> > releases by packaging up the appropriate components and 
> > > > > >>> >> > ensuring that
> > > > > >>> >> > they can be built and installed independently as now
> > > > > >>> >> >
> > > > > >>> >> > 6) The CI processes would be merged -- since we already 
> > > > > >>> >> > build the
> > > > > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount 
> > > > > >>> >> > to
> > > > > >>> >> > building the Parquet unit tests and running them.
> > > > > >>> >> >
> > > > > >>> >> > 7) Patches contributed that do not involve Arrow-related 
> > > > > >>> >> > functionality
> > > > > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX 
> > > > > >>> >> > patches may
> > > > > >>> >> > span both codebases
> > > > > >>> >> >
> > > > > >>> >> > 8) Parquet C++ committers can be given push rights on 
> > > > > >>> >> > apache/arrow
> > > > > >>> >> > subject to ongoing good citizenry (e.g. not merging patches 
> > > > > >>> >> > that break
> > > > > >>> >> > builds). The Arrow PMC may need to vote on the procedure for 
> > > > > >>> >> > offering
> > > > > >>> >> > pass-through commit rights to anyone who has been invited to 
> > > > > >>> >> > be a
> > > > > >>> >> > committer for Apache Parquet
> > > > > >>> >> >
> > > > > >>> >> > 9) The contributors who work on both Arrow and Parquet will 
> > > > > >>> >> > work in
> > > > > >>> >> > good faith to ensure that that needs of Parquet-only 
> > > > > >>> >> > developers (i.e.
> > > > > >>> >> > who consume Parquet files in some way unrelated to the Arrow 
> > > > > >>> >> > columnar
> > > > > >>> >> > standard) are accommodated
> > > > > >>> >> >
> > > > > >>> >> > There are a number of particular details we will need to 
> > > > > >>> >> > discuss
> > > > > >>> >> > further (such as the specific logistics of the codebase 
> > > > > >>> >> > surgery; e.g.
> > > > > >>> >> > how to manage the commit history in apache/parquet-cpp -- do 
> > > > > >>> >> > we care
> > > > > >>> >> > about git blame?)
> > > > > >>> >> >
> > > > > >>> >> > This vote is to determine if the Parquet PMC is in favor of 
> > > > > >>> >> > working in
> > > > > >>> >> > good faith to execute on the above plan. I will inquire with 
> > > > > >>> >> > the Arrow
> > > > > >>> >> > PMC to see if we need to have a corresponding vote there, 
> > > > > >>> >> > and also how
> > > > > >>> >> > to handle the management of commit rights.
> > > > > >>> >> >
> > > > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
> > > > > >>> >> > [ ] +0: . . .
> > > > > >>> >> > [ ] -1: Not in favor because . . .
> > > > > >>> >> >
> > > > > >>> >> > Here is my vote: +1.
> > > > > >>> >> >
> > > > > >>> >> > Thank you,
> > > > > >>> >> > Wes
> > > > > >>> >> >
> > > > > >>> >> > [1]: 
> > > > > >>> >> > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
> > > > > >>> >> > [2]: 
> > > > > >>> >> > https://github.com/apache/parquet-cpp/tree/master/src/parquet
> > > > > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Reply via email to