hi all,

with 3 binding +1 votes, the vote carries. We will discuss with Apache
Arrow about how to specifically proceed

I have already done the preparatory work to undertake the merge

https://github.com/apache/arrow/pull/2453

thanks
Wes

On Tue, Aug 21, 2018 at 10:41 AM, Wes McKinney <[email protected]> wrote:
> Yes, feel free to have a look at
>
> https://github.com/apache/arrow/pull/2453
>
> I'm not very in favor of having a commingled non-linear history that
> makes git bisect difficult. We will have to discuss on the Arrow ML
>
> Here's an example from Apache Spark where a similar merge took place
>
> https://github.com/apache/spark/commit/2fe0a1aaeebbf7f60bd4130847d738c29f1e3d53
>
> It would be my preference to have a single squashed commit whose
> message attributes the developers of the code and provides links back
> to the original commit history in the commit message
>
> - Wes
>
>
> On Tue, Aug 21, 2018 at 9:52 AM, Uwe L. Korn <[email protected]> wrote:
>> I have a very strong preference to keep the git history. I will have a look 
>> tomorrow to find the correct git magic to get a linear history. For me a 
>> single merge commit would be ok but I'm fine to spend an additional hour on 
>> this if you care strongly about linear history.
>>
>> Uwe
>>
>> On Sun, Aug 19, 2018, at 7:36 PM, Wes McKinney wrote:
>>> OK. I'm a bit -0 on doing anything that results in Arrow having a
>>> nonlinear git history (and rebasing is not really an option) but we
>>> can discuss that more later
>>>
>>> On Sun, Aug 19, 2018 at 8:50 AM, Uwe L. Korn <[email protected]> wrote:
>>> > +1 on this but also see my comments in the mail on the discussions.
>>> >
>>> > We should also keep the git history of parquet-cpp, that should not be 
>>> > hard with git and there is probably a StackOverflow answer out there that 
>>> > gives you the commands to do the merge.
>>> >
>>> > Uwe
>>> >
>>> > On Fri, Aug 17, 2018, at 12:57 AM, Wes McKinney wrote:
>>> >> In case any are interested: my estimate of the work involved in the
>>> >> migration to be about a full day of total work, possibly less. As soon
>>> >> as the migration plan is decided upon I intend to execute ASAP so that
>>> >> ongoing development efforts are not disrupted.
>>> >>
>>> >> Additionally, in flight patches do not all need to be merged. Patches
>>> >> can be easily edited to apply against the modified repository
>>> >> structure
>>> >>
>>> >> On Wed, Aug 15, 2018 at 6:04 PM, Wes McKinney <[email protected]> 
>>> >> wrote:
>>> >> > hi all,
>>> >> >
>>> >> > As discussed on the mailing list [1] I am proposing to undertake a
>>> >> > restructuring of the development process for parquet-cpp and its
>>> >> > consumption in the Arrow ecosystem to benefit the developers and users
>>> >> > of both communities.
>>> >> >
>>> >> > The specific actions we would take would be:
>>> >> >
>>> >> > 1) Move the source code currently located at src/ in the
>>> >> > apache/parquet-cpp repository [2] to the cpp/src/ directory located in
>>> >> > apache/arrow [3]
>>> >> >
>>> >> > 2) The parquet code tree would remain separate from the Arrow code
>>> >> > tree, though the two projects will continue to share code as they do
>>> >> > now
>>> >> >
>>> >> > 3) The build system in apache/parquet-cpp would be effectively
>>> >> > deprecated and can be mostly discarded, as it is largely redundant and
>>> >> > duplicated from the build system in apache/arrow
>>> >> >
>>> >> > 4) The Parquet and Arrow C++ communities will collaborate to provide
>>> >> > development workflows to enable contributors working exclusively on
>>> >> > the Parquet core functionality to be able to work unencumbered with
>>> >> > unnecessary build or test dependencies from the rest of the Arrow
>>> >> > codebase. Note that parquet-cpp already builds a significant portion
>>> >> > of Apache Arrow en route to creating its libraries
>>> >> >
>>> >> > 5) The Parquet community can create scripts to "cut" Parquet C++
>>> >> > releases by packaging up the appropriate components and ensuring that
>>> >> > they can be built and installed independently as now
>>> >> >
>>> >> > 6) The CI processes would be merged -- since we already build the
>>> >> > Parquet libraries in Arrow's CI workflow, this would amount to
>>> >> > building the Parquet unit tests and running them.
>>> >> >
>>> >> > 7) Patches contributed that do not involve Arrow-related functionality
>>> >> > could use the PARQUET-XXXX marking, though some ARROW-XXXX patches may
>>> >> > span both codebases
>>> >> >
>>> >> > 8) Parquet C++ committers can be given push rights on apache/arrow
>>> >> > subject to ongoing good citizenry (e.g. not merging patches that break
>>> >> > builds). The Arrow PMC may need to vote on the procedure for offering
>>> >> > pass-through commit rights to anyone who has been invited to be a
>>> >> > committer for Apache Parquet
>>> >> >
>>> >> > 9) The contributors who work on both Arrow and Parquet will work in
>>> >> > good faith to ensure that that needs of Parquet-only developers (i.e.
>>> >> > who consume Parquet files in some way unrelated to the Arrow columnar
>>> >> > standard) are accommodated
>>> >> >
>>> >> > There are a number of particular details we will need to discuss
>>> >> > further (such as the specific logistics of the codebase surgery; e.g.
>>> >> > how to manage the commit history in apache/parquet-cpp -- do we care
>>> >> > about git blame?)
>>> >> >
>>> >> > This vote is to determine if the Parquet PMC is in favor of working in
>>> >> > good faith to execute on the above plan. I will inquire with the Arrow
>>> >> > PMC to see if we need to have a corresponding vote there, and also how
>>> >> > to handle the management of commit rights.
>>> >> >
>>> >> > [ ] +1: In favor of implementing the proposed monorepo plan
>>> >> > [ ] +0: . . .
>>> >> > [ ] -1: Not in favor because . . .
>>> >> >
>>> >> > Here is my vote: +1.
>>> >> >
>>> >> > Thank you,
>>> >> > Wes
>>> >> >
>>> >> > [1]: 
>>> >> > https://lists.apache.org/thread.html/4bc135b4e933b959602df48bc3d5978ab7a4299d83d4295da9f498ac@%3Cdev.parquet.apache.org%3E
>>> >> > [2]: https://github.com/apache/parquet-cpp/tree/master/src/parquet
>>> >> > [3]: https://github.com/apache/arrow/tree/master/cpp/src

Reply via email to