Dear all, The repo merge is nearly ready to go modulo some fixes to CI. There will be a number of follow up issues to re-establish the various (untested) build procedures in parquet-cpp
https://github.com/apache/arrow/pull/2453 I would like to merge this by EOD Wednesday 9/5, or Thursday at latest, so we can get the patches from apache/parquet-cpp moved over and avoid any disruption to development process. If there are any comments please let me know - Wes On Tue, Aug 21, 2018 at 12:23 PM Wes McKinney <[email protected]> wrote: > > hi all, > > with 3 binding +1 votes, the vote carries. We will discuss with Apache > Arrow about how to specifically proceed > > I have already done the preparatory work to undertake the merge > > https://github.com/apache/arrow/pull/2453 > > thanks > Wes > > On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <[email protected]> wrote: > > Yes, feel free to have a look at > > > > https://github.com/apache/arrow/pull/2453 > > > > I'm not very in favor of having a commingled non-linear history that > > makes git bisect difficult. We will have to discuss on the Arrow ML > > > > Here's an example from Apache Spark where a similar merge took place > > > > https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53 > > > > It would be my preference to have a single squashed commit whose > > message attributes the developers of the code and provides links back > > to the original commit history in the commit message > > > > - Wes > > > > > > On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <[email protected]> wrote: > >> I have a very strong preference to keep the git history. I will have a > >> look tomorrow to find the correct git magic to get a linear history. For > >> me a single merge commit would be ok but I'm fine to spend an additional > >> hour on this if you care strongly about linear history. > >> > >> Uwe > >> > >> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote: > >>> OK. I'm a bit -0 on doing anything that results in Arrow having a > >>> nonlinear git history (and rebasing is not really an option) but we > >>> can discuss that more later > >>> > >>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <[email protected]> wrote: > >>> > +1 on this but also see my comments in the mail on the discussions. > >>> > > >>> > We should also keep the git history of parquet-cpp, that should not be > >>> > hard with git and there is probably a StackOverflow answer out there > >>> > that gives you the commands to do the merge. > >>> > > >>> > Uwe > >>> > > >>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote: > >>> >> In case any are interested: my estimate of the work involved in the > >>> >> migration to be about a full day of total work, possibly less. As soon > >>> >> as the migration plan is decided upon I intend to execute ASAP so that > >>> >> ongoing development efforts are not disrupted. > >>> >> > >>> >> Additionally, in flight patches do not all need to be merged. Patches > >>> >> can be easily edited to apply against the modified repository > >>> >> structure > >>> >> > >>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <[email protected]> > >>> >> wrote: > >>> >> > hi all, > >>> >> > > >>> >> > As discussed on the mailing list [1] I am proposing to undertake a > >>> >> > restructuring of the development process for parquet-cpp and its > >>> >> > consumption in the Arrow ecosystem to benefit the developers and > >>> >> > users > >>> >> > of both communities. > >>> >> > > >>> >> > The specific actions we would take would be: > >>> >> > > >>> >> > 1) Move the source code currently located at src/ in the > >>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located > >>> >> > in > >>> >> > apache/arrow [3] > >>> >> > > >>> >> > 2) The parquet code tree would remain separate from the Arrow code > >>> >> > tree, though the two projects will continue to share code as they do > >>> >> > now > >>> >> > > >>> >> > 3) The build system in apache/parquet-cpp would be effectively > >>> >> > deprecated and can be mostly discarded, as it is largely redundant > >>> >> > and > >>> >> > duplicated from the build system in apache/arrow > >>> >> > > >>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide > >>> >> > development workflows to enable contributors working exclusively on > >>> >> > the Parquet core functionality to be able to work unencumbered with > >>> >> > unnecessary build or test dependencies from the rest of the Arrow > >>> >> > codebase. Note that parquet-cpp already builds a significant portion > >>> >> > of Apache Arrow en route to creating its libraries > >>> >> > > >>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++ > >>> >> > releases by packaging up the appropriate components and ensuring that > >>> >> > they can be built and installed independently as now > >>> >> > > >>> >> > 6) The CI processes would be merged -- since we already build the > >>> >> > Parquet libraries in Arrow's CI workflow, this would amount to > >>> >> > building the Parquet unit tests and running them. > >>> >> > > >>> >> > 7) Patches contributed that do not involve Arrow-related > >>> >> > functionality > >>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches > >>> >> > may > >>> >> > span both codebases > >>> >> > > >>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow > >>> >> > subject to ongoing good citizenry (e.g. not merging patches that > >>> >> > break > >>> >> > builds). The Arrow PMC may need to vote on the procedure for offering > >>> >> > pass-through commit rights to anyone who has been invited to be a > >>> >> > committer for Apache Parquet > >>> >> > > >>> >> > 9) The contributors who work on both Arrow and Parquet will work in > >>> >> > good faith to ensure that that needs of Parquet-only developers (i.e. > >>> >> > who consume Parquet files in some way unrelated to the Arrow columnar > >>> >> > standard) are accommodated > >>> >> > > >>> >> > There are a number of particular details we will need to discuss > >>> >> > further (such as the specific logistics of the codebase surgery; e.g. > >>> >> > how to manage the commit history in apache/parquet-cpp -- do we care > >>> >> > about git blame?) > >>> >> > > >>> >> > This vote is to determine if the Parquet PMC is in favor of working > >>> >> > in > >>> >> > good faith to execute on the above plan. I will inquire with the > >>> >> > Arrow > >>> >> > PMC to see if we need to have a corresponding vote there, and also > >>> >> > how > >>> >> > to handle the management of commit rights. > >>> >> > > >>> >> > [ ] +1: In favor of implementing the proposed monorepo plan > >>> >> > [ ] +0: . . . > >>> >> > [ ] -1: Not in favor because . . . > >>> >> > > >>> >> > Here is my vote: +1. > >>> >> > > >>> >> > Thank you, > >>> >> > Wes > >>> >> > > >>> >> > [1]: > >>> >> > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E > >>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet > >>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src
