Also as Nong mentioned, PRs should be prefixed by the jira id followed by a ":" as follows "PARQUET-X: description" that's just to have the reference in the git changelog. The merge script enforces it.
On Tue, Jan 26, 2016 at 3:24 PM, Julien Le Dem <[email protected]> wrote: > I'm happy too with Aliaksei, Deepak, Wes, etc reviewing each other. > I see Nong (who's a committer) has been doing some reviews already. > > When you guys reach a consensus on a PR and want it merged please mention > it in the PR (+1, LGTM) and mention us directly (@julienledem, ...) to have > it merged. > > right now I see that #19 and #21 have been committed (thanks Nong) but it > is not clear to me in what order the others should be committed. > > For example Deepak should comment directly on #22 to approve it. Right now > he mentioned it on another PR. > https://github.com/apache/parquet-cpp/pull/24#issuecomment-174354139 > Similarly Wes could confirm on that PR whether it looks good. > > Tomorrow is the Parquet sync up if you want to discuss further: > https://plus.google.com/u/0/events/cvgi67jmoptmgb1i488re8scbuo > > > On Mon, Jan 25, 2016 at 4:20 PM, Ryan Blue <[email protected]> wrote: > >> Aliaksei, thanks for being understanding here. >> >> I agree with you that it is too difficult. We really want to get the cpp >> side bootstrapped as soon as possible. Lets go with what you suggested, to >> have contributors review one another's patches and then ask a committer for >> a final review once both contributors reach a consensus. >> >> If there are issues that are easy to review, maybe some of us other than >> Nong can take a look. >> >> rb >> >> >> On 01/25/2016 02:33 PM, Aliaksei Sandryhaila wrote: >> >>> Hi Ryan, >>> >>> This sounds very reasonable. I do not argue to disregard the standard >>> Apache approach to promoting contributors to committers. I am just >>> pointing out that without the input from current committers it is hard >>> for us to productively contribute to the project. As a consequence, it >>> is hard for us demonstrate our fit to become committers in the future. >>> This leaves us in a deadlock, which can be resolved either by an >>> increased feedback from existing committers or by making us committers >>> sooner. >>> >>> I understand that most committers on the Parquet project are working on >>> the Java implementation, so it can be harder for them to review patches >>> for parquet-cpp. In this regard, how about the following protocol for >>> parquet-cpp pull requests: After contributors review and revise a pull >>> request and agree that it is in a good shape, we will ask a designated >>> committer to review and commit the pull request. So far we have been >>> asking Nong; if there is a better designated committer for parquet-cpp, >>> please let us know. >>> >>> Thank you, >>> Aliaksei. >>> >>> >>> On 01/25/2016 04:54 PM, Ryan Blue wrote: >>> >>>> Hi everyone, >>>> >>>> Sorry about the current backlog on the parquet-cpp side. Most of the >>>> current committer base works on the Java implementation so it's either >>>> slow or not reliable for us to do those reviews. >>>> >>>> I think the best way to move forward is to review patches for each >>>> other. That will keep those issues progressing, make it easy for >>>> committers to validate the commit, and -- most importantly -- to build >>>> a trail of contributions that we can look at to vote in new committers. >>>> >>>> I completely sympathize with the need for committers on the CPP >>>> project, but I don't think this will take a long time given the >>>> current level of activity. We're really just trying to build >>>> confidence that: >>>> >>>> 1. You produce quality contributions and understand the codebase >>>> 2. You give friendly, thoughtful reviews and don't rubber-stamp >>>> 3. You defer judgment and ask others when you don't know >>>> 4. You respect others and interact professionally >>>> >>>> I don't think any of those are that hard to demonstrate, but I'd be >>>> uncomfortable not validating committers like we normally do. >>>> Especially in this situation, where I could easily see the amount of >>>> work you guys are doing adding up pretty quickly! >>>> >>>> Does that sound like a reasonable path forward? >>>> >>>> rb >>>> >>>> >>>> On 01/25/2016 12:46 PM, Aliaksei Sandryhaila wrote: >>>> >>>>> Hi Nong and Julien, >>>>> >>>>> As Wes has pointed out, we have a number of patches for parquet-cpp >>>>> outstanding. Wes, Deepak, and I have been reviewing each other's pull >>>>> requests. At this point, the patches need to be reviewed and approved >>>>> by >>>>> Parquet committers in order to be committed to master. >>>>> >>>>> Unfortunately, there is not much activity on this side of the project. >>>>> The lack of response from current committers is holding us back, and we >>>>> have to repeatedly rebase our batches, merge multiple pull requests >>>>> together, and overall step on each others' toes. >>>>> >>>>> Is it possible to make Wes, Deepak, and me committers on the project, >>>>> so >>>>> we can contribute to parquet-cpp more efficiently? >>>>> >>>>> Thanks, >>>>> Aliaksei. >>>>> >>>>> >>>>> On 01/23/2016 06:07 PM, Wes McKinney wrote: >>>>> >>>>>> Folks, >>>>>> >>>>>> We're working on a pretty solid patch queue. >>>>>> >>>>>> independent patches >>>>>> PARQUET-449: https://github.com/apache/parquet-cpp/pull/21 >>>>>> >>>>>> interdependent patches (order to apply patches) >>>>>> PARQUET-437 (MOSTLY REVIEWED): >>>>>> https://github.com/apache/parquet-cpp/pull/19 >>>>>> >>>>>> PARQUET-418: https://github.com/apache/parquet-cpp/pull/18 >>>>>> PARQUET-434: https://github.com/apache/parquet-cpp/pull/20 >>>>>> PARQUET-433: https://github.com/apache/parquet-cpp/pull/22 >>>>>> PARQUET-451 & PARQUET-453: >>>>>> https://github.com/apache/parquet-cpp/pull/23 >>>>>> >>>>>> PARQUET-428 (needs to be rebased on top of PARQUET-433): >>>>>> https://github.com/apache/parquet-cpp/pull/24 >>>>>> >>>>>> I'm going to take a breather and work on some other things this >>>>>> weekend, >>>>>> but I'll be available for code reviews and fixes to try to move along >>>>>> this >>>>>> patch queue. >>>>>> >>>>>> Thanks, >>>>>> Wes >>>>>> >>>>>> On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]> >>>>>> wrote: >>>>>> >>>>>> Great to meet you all! >>>>>>> >>>>>>> I've recently been collaborating with the Apache Drill team to spin >>>>>>> out >>>>>>> the ValueVector columnar in-memory data structure into a new >>>>>>> standalone >>>>>>> project that will be called Arrow [1] [2]. A brief summary of >>>>>>> Arrow/ValueVectors is that it permits O(1) random access on nested >>>>>>> columnar >>>>>>> structures and is efficient for projections and scans in a columnar >>>>>>> SQL >>>>>>> setting. >>>>>>> >>>>>>> I'm very interested in making Parquet read/write support available to >>>>>>> Python programmers via C/C++ extensions, so I'm going to be working >>>>>>> the >>>>>>> next few months on a Parquet->Arrow->Python toolchain, along with >>>>>>> some >>>>>>> tools to manipulate tables in-memory columnar data in the style of >>>>>>> Python's >>>>>>> pandas library. >>>>>>> >>>>>>> I will propose patches as needed to parquet-cpp to improve its >>>>>>> performance >>>>>>> and add functionality for writing Parquet files as well. The >>>>>>> details of >>>>>>> converting to/from Parquet's repetition/definition level >>>>>>> representation of >>>>>>> nested data will stay separate in the arrow-parquet adapter code. >>>>>>> >>>>>>> cheers, >>>>>>> Wes >>>>>>> >>>>>>> [1]: >>>>>>> >>>>>>> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E >>>>>>> >>>>>>> >>>>>>> [2]: >>>>>>> >>>>>>> http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490 >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <[email protected] >>>>>>> > >>>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>>> >>>>>>>> I'm very interested in this subject because I would like to export >>>>>>>> parquet data from HDFS to Vertica (using VSQL). >>>>>>>> I'm planning to work on it next quarter, but I will be very happy to >>>>>>>> help >>>>>>>> you on this subject (review, testing). >>>>>>>> >>>>>>>> Have a nice day, >>>>>>>> -- >>>>>>>> Mickaël Lacour >>>>>>>> Senior Software Engineer >>>>>>>> Analytics Infrastructure team @Scalability >>>>>>>> >>>>>>>> ________________________________________ >>>>>>>> From: Walkauskas, Stephen Gregory (Vertica) >>>>>>>> <[email protected]> >>>>>>>> Sent: Thursday, January 14, 2016 3:23 PM >>>>>>>> To: Sandryhaila, Aliaksei; [email protected]; Majeti, Deepak; >>>>>>>> [email protected]; Wes McKinney >>>>>>>> Subject: Re: Parquet-cpp >>>>>>>> >>>>>>>> Yes, thanks for the introduction Julien. >>>>>>>> >>>>>>>> Nong and Wes, >>>>>>>> >>>>>>>> It'd be interesting to know your goals for parquet-cpp. >>>>>>>> >>>>>>>> The Vertica database already supports optimized reads of ORC files >>>>>>>> (fast >>>>>>>> c++ parser, predicate pushdown, columns selection etc). We'd like >>>>>>>> to do >>>>>>>> the same for parquet. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Stephen >>>>>>>> >>>>>>>> On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote: >>>>>>>> >>>>>>>>> Thank you for the introduction, Julien! >>>>>>>>> >>>>>>>>> Hello Nong and Wes, >>>>>>>>> >>>>>>>>> Stephen, Deepak and I are developing a C++ library to support >>>>>>>>> Parquet in >>>>>>>>> Vertica RDBMS. We are using Parquet-cpp as a starting point and are >>>>>>>>> expanding its functionality as well as improving it and fixing >>>>>>>>> bugs. We >>>>>>>>> would like to contribute these improvements back to the open-source >>>>>>>>> community. We plan to do this through the usual process of creating >>>>>>>>> jiras that justify and explain a code change, and then submitting >>>>>>>>> pull >>>>>>>>> requests. We look forward to working with you on Parquet-cpp and to >>>>>>>>> your >>>>>>>>> feedback and suggestions. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Aliaksei. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 01/13/2016 02:54 PM, Julien Le Dem wrote: >>>>>>>>> >>>>>>>>>> Hello Nong, Wes, Stephen, Deepak and Aliaksei >>>>>>>>>> I wanted to introduce you to each other as you are all looking at >>>>>>>>>> Parquet-cpp. >>>>>>>>>> >>>>>>>>>> I'd recommend opening JIRAs in the parquet-cpp component to >>>>>>>>>> >>>>>>>>> collaborate (I >>>>>>>> >>>>>>>>> see you already doing this): >>>>>>>>>> >>>>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp >>>>>>>> >>>>>>>> >>>>>>>> Nong is a committer and can merged pull requests (he also >>>>>>>>>> understands >>>>>>>>>> >>>>>>>>> that >>>>>>>> >>>>>>>>> code base very well). >>>>>>>>>> Other committer can too, feel free to ping us if you need help >>>>>>>>>> Obviously, you don't need to be a committer to give others reviews >>>>>>>>>> (you >>>>>>>>>> just need one to approve and merge). >>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> >> -- >> Ryan Blue >> Software Engineer >> Cloudera, Inc. >> > > > > -- > Julien > -- Julien
