Also as Nong mentioned, PRs should be prefixed by the jira id followed by a
":" as follows "PARQUET-X: description" that's just to have the reference
in the git changelog. The merge script enforces it.


On Tue, Jan 26, 2016 at 3:24 PM, Julien Le Dem <[email protected]> wrote:

> I'm happy too with Aliaksei, Deepak, Wes, etc reviewing each other.
> I see Nong (who's a committer) has been doing some reviews already.
>
> When you guys reach a consensus on a PR and want it merged please mention
> it in the PR (+1, LGTM) and mention us directly (@julienledem, ...) to have
> it merged.
>
> right now I see that #19 and #21 have been committed (thanks Nong) but it
> is not clear to me in what order the others should be committed.
>
> For example Deepak should comment directly on #22 to approve it. Right now
> he mentioned it on another PR.
> https://github.com/apache/parquet-cpp/pull/24#issuecomment-174354139
> Similarly Wes could confirm on that PR whether it looks good.
>
> Tomorrow is the Parquet sync up if you want to discuss further:
> https://plus.google.com/u/0/events/cvgi67jmoptmgb1i488re8scbuo
>
>
> On Mon, Jan 25, 2016 at 4:20 PM, Ryan Blue <[email protected]> wrote:
>
>> Aliaksei, thanks for being understanding here.
>>
>> I agree with you that it is too difficult. We really want to get the cpp
>> side bootstrapped as soon as possible. Lets go with what you suggested, to
>> have contributors review one another's patches and then ask a committer for
>> a final review once both contributors reach a consensus.
>>
>> If there are issues that are easy to review, maybe some of us other than
>> Nong can take a look.
>>
>> rb
>>
>>
>> On 01/25/2016 02:33 PM, Aliaksei Sandryhaila wrote:
>>
>>> Hi Ryan,
>>>
>>> This sounds very reasonable. I do not argue to disregard the standard
>>> Apache approach to promoting contributors to committers. I am just
>>> pointing out that without the input from current committers it is hard
>>> for us to productively contribute to the project. As a consequence, it
>>> is hard for us demonstrate our fit to become committers in the future.
>>> This leaves us in a deadlock, which can be resolved either by an
>>> increased feedback from existing committers or by making us committers
>>> sooner.
>>>
>>> I understand that most committers on the Parquet project are working on
>>> the Java implementation, so it can be harder for them to review patches
>>> for parquet-cpp. In this regard, how about the following protocol for
>>> parquet-cpp pull requests: After contributors review and revise a pull
>>> request and agree that it is in a good shape, we will ask a designated
>>> committer to review and commit the pull request. So far we have been
>>> asking Nong; if there is a better designated committer for parquet-cpp,
>>> please let us know.
>>>
>>> Thank you,
>>> Aliaksei.
>>>
>>>
>>> On 01/25/2016 04:54 PM, Ryan Blue wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Sorry about the current backlog on the parquet-cpp side. Most of the
>>>> current committer base works on the Java implementation so it's either
>>>> slow or not reliable for us to do those reviews.
>>>>
>>>> I think the best way to move forward is to review patches for each
>>>> other. That will keep those issues progressing, make it easy for
>>>> committers to validate the commit, and -- most importantly -- to build
>>>> a trail of contributions that we can look at to vote in new committers.
>>>>
>>>> I completely sympathize with the need for committers on the CPP
>>>> project, but I don't think this will take a long time given the
>>>> current level of activity. We're really just trying to build
>>>> confidence that:
>>>>
>>>> 1. You produce quality contributions and understand the codebase
>>>> 2. You give friendly, thoughtful reviews and don't rubber-stamp
>>>> 3. You defer judgment and ask others when you don't know
>>>> 4. You respect others and interact professionally
>>>>
>>>> I don't think any of those are that hard to demonstrate, but I'd be
>>>> uncomfortable not validating committers like we normally do.
>>>> Especially in this situation, where I could easily see the amount of
>>>> work you guys are doing adding up pretty quickly!
>>>>
>>>> Does that sound like a reasonable path forward?
>>>>
>>>> rb
>>>>
>>>>
>>>> On 01/25/2016 12:46 PM, Aliaksei Sandryhaila wrote:
>>>>
>>>>> Hi Nong and Julien,
>>>>>
>>>>> As Wes has pointed out, we have a number of patches for parquet-cpp
>>>>> outstanding. Wes, Deepak, and I have been reviewing each other's pull
>>>>> requests. At this point, the patches need to be reviewed and approved
>>>>> by
>>>>> Parquet committers in order to be committed to master.
>>>>>
>>>>> Unfortunately, there is not much activity on this side of the project.
>>>>> The lack of response from current committers is holding us back, and we
>>>>> have to repeatedly rebase our batches, merge multiple pull requests
>>>>> together, and overall step on each others' toes.
>>>>>
>>>>> Is it possible to make Wes, Deepak, and me committers on the project,
>>>>> so
>>>>> we can contribute to parquet-cpp more efficiently?
>>>>>
>>>>> Thanks,
>>>>> Aliaksei.
>>>>>
>>>>>
>>>>> On 01/23/2016 06:07 PM, Wes McKinney wrote:
>>>>>
>>>>>> Folks,
>>>>>>
>>>>>> We're working on a pretty solid patch queue.
>>>>>>
>>>>>> independent patches
>>>>>> PARQUET-449: https://github.com/apache/parquet-cpp/pull/21
>>>>>>
>>>>>> interdependent patches (order to apply patches)
>>>>>> PARQUET-437 (MOSTLY REVIEWED):
>>>>>> https://github.com/apache/parquet-cpp/pull/19
>>>>>>
>>>>>> PARQUET-418: https://github.com/apache/parquet-cpp/pull/18
>>>>>> PARQUET-434: https://github.com/apache/parquet-cpp/pull/20
>>>>>> PARQUET-433: https://github.com/apache/parquet-cpp/pull/22
>>>>>> PARQUET-451 & PARQUET-453:
>>>>>> https://github.com/apache/parquet-cpp/pull/23
>>>>>>
>>>>>> PARQUET-428 (needs to be rebased on top of PARQUET-433):
>>>>>> https://github.com/apache/parquet-cpp/pull/24
>>>>>>
>>>>>> I'm going to take a breather and work on some other things this
>>>>>> weekend,
>>>>>> but I'll be available for code reviews and fixes to try to move along
>>>>>> this
>>>>>> patch queue.
>>>>>>
>>>>>> Thanks,
>>>>>> Wes
>>>>>>
>>>>>> On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>> Great to meet you all!
>>>>>>>
>>>>>>> I've recently been collaborating with the Apache Drill team to spin
>>>>>>> out
>>>>>>> the ValueVector columnar in-memory data structure into a new
>>>>>>> standalone
>>>>>>> project that will be called Arrow [1] [2]. A brief summary of
>>>>>>> Arrow/ValueVectors is that it permits O(1) random access on nested
>>>>>>> columnar
>>>>>>> structures and is efficient for projections and scans in a columnar
>>>>>>> SQL
>>>>>>> setting.
>>>>>>>
>>>>>>> I'm very interested in making Parquet read/write support available to
>>>>>>> Python programmers via C/C++ extensions, so I'm going to be working
>>>>>>> the
>>>>>>> next few months on a Parquet->Arrow->Python toolchain, along with
>>>>>>> some
>>>>>>> tools to manipulate tables in-memory columnar data in the style of
>>>>>>> Python's
>>>>>>> pandas library.
>>>>>>>
>>>>>>> I will propose patches as needed to parquet-cpp to improve its
>>>>>>> performance
>>>>>>> and add functionality for writing Parquet files as well. The
>>>>>>> details of
>>>>>>> converting to/from Parquet's repetition/definition level
>>>>>>> representation of
>>>>>>> nested data will stay separate in the arrow-parquet adapter code.
>>>>>>>
>>>>>>> cheers,
>>>>>>> Wes
>>>>>>>
>>>>>>> [1]:
>>>>>>>
>>>>>>> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E
>>>>>>>
>>>>>>>
>>>>>>> [2]:
>>>>>>>
>>>>>>> http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <[email protected]
>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm very interested in this subject because I would like to export
>>>>>>>> parquet data from HDFS to Vertica (using VSQL).
>>>>>>>> I'm planning to work on it next quarter, but I will be very happy to
>>>>>>>> help
>>>>>>>> you on this subject (review, testing).
>>>>>>>>
>>>>>>>> Have a nice day,
>>>>>>>> --
>>>>>>>> Mickaël Lacour
>>>>>>>> Senior Software Engineer
>>>>>>>> Analytics Infrastructure team @Scalability
>>>>>>>>
>>>>>>>> ________________________________________
>>>>>>>> From: Walkauskas, Stephen Gregory (Vertica)
>>>>>>>> <[email protected]>
>>>>>>>> Sent: Thursday, January 14, 2016 3:23 PM
>>>>>>>> To: Sandryhaila, Aliaksei; [email protected]; Majeti, Deepak;
>>>>>>>> [email protected]; Wes McKinney
>>>>>>>> Subject: Re: Parquet-cpp
>>>>>>>>
>>>>>>>> Yes, thanks for the introduction Julien.
>>>>>>>>
>>>>>>>> Nong and Wes,
>>>>>>>>
>>>>>>>> It'd be interesting to know your goals for parquet-cpp.
>>>>>>>>
>>>>>>>> The Vertica database already supports optimized reads of ORC files
>>>>>>>> (fast
>>>>>>>> c++ parser, predicate pushdown, columns selection etc). We'd like
>>>>>>>> to do
>>>>>>>> the same for parquet.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Stephen
>>>>>>>>
>>>>>>>> On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote:
>>>>>>>>
>>>>>>>>> Thank you for the introduction, Julien!
>>>>>>>>>
>>>>>>>>> Hello Nong and Wes,
>>>>>>>>>
>>>>>>>>> Stephen, Deepak and I are developing a C++ library to support
>>>>>>>>> Parquet in
>>>>>>>>> Vertica RDBMS. We are using Parquet-cpp as a starting point and are
>>>>>>>>> expanding its functionality as well as improving it and fixing
>>>>>>>>> bugs. We
>>>>>>>>> would like to contribute these improvements back to the open-source
>>>>>>>>> community. We plan to do this through the usual process of creating
>>>>>>>>> jiras that justify and explain a code change, and then submitting
>>>>>>>>> pull
>>>>>>>>> requests. We look forward to working with you on Parquet-cpp and to
>>>>>>>>> your
>>>>>>>>> feedback and suggestions.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Aliaksei.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 01/13/2016 02:54 PM, Julien Le Dem wrote:
>>>>>>>>>
>>>>>>>>>> Hello Nong, Wes, Stephen, Deepak and Aliaksei
>>>>>>>>>> I wanted to introduce you to each other as you are all looking at
>>>>>>>>>> Parquet-cpp.
>>>>>>>>>>
>>>>>>>>>> I'd recommend opening JIRAs in the parquet-cpp component to
>>>>>>>>>>
>>>>>>>>> collaborate (I
>>>>>>>>
>>>>>>>>> see you already doing this):
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp
>>>>>>>>
>>>>>>>>
>>>>>>>> Nong is a committer and can merged pull requests (he also
>>>>>>>>>> understands
>>>>>>>>>>
>>>>>>>>> that
>>>>>>>>
>>>>>>>>> code base very well).
>>>>>>>>>> Other committer can too, feel free to ping us if you need help
>>>>>>>>>> Obviously, you don't need to be a committer to give others reviews
>>>>>>>>>> (you
>>>>>>>>>> just need one to approve and merge).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>
>
>
> --
> Julien
>



-- 
Julien

Reply via email to