There's 3 more patches outstanding that are causing blockage (418, 433, and 451/453), so I think if we get them merged today or tomorrow when we should be able to proceed with some parallel efforts without quite as much conflict.
On Tue, Jan 26, 2016 at 3:56 PM, Nong Li <[email protected]> wrote: > I'm going to try to more active this week but I admittedly don't have a > lot of > time to work on this. I understand we need to get critical mass in > committers, > code, etc to keep this going but I think we're making good progress. > > On Tue, Jan 26, 2016 at 3:27 PM, Julien Le Dem <[email protected]> wrote: > >> Also as Nong mentioned, PRs should be prefixed by the jira id followed by >> a ":" as follows "PARQUET-X: description" that's just to have the reference >> in the git changelog. The merge script enforces it. >> >> >> On Tue, Jan 26, 2016 at 3:24 PM, Julien Le Dem <[email protected]> wrote: >> >>> I'm happy too with Aliaksei, Deepak, Wes, etc reviewing each other. >>> I see Nong (who's a committer) has been doing some reviews already. >>> >>> When you guys reach a consensus on a PR and want it merged please >>> mention it in the PR (+1, LGTM) and mention us directly (@julienledem, ...) >>> to have it merged. >>> >>> right now I see that #19 and #21 have been committed (thanks Nong) but >>> it is not clear to me in what order the others should be committed. >>> >>> For example Deepak should comment directly on #22 to approve it. Right >>> now he mentioned it on another PR. >>> https://github.com/apache/parquet-cpp/pull/24#issuecomment-174354139 >>> Similarly Wes could confirm on that PR whether it looks good. >>> >>> Tomorrow is the Parquet sync up if you want to discuss further: >>> https://plus.google.com/u/0/events/cvgi67jmoptmgb1i488re8scbuo >>> >>> >>> On Mon, Jan 25, 2016 at 4:20 PM, Ryan Blue <[email protected]> wrote: >>> >>>> Aliaksei, thanks for being understanding here. >>>> >>>> I agree with you that it is too difficult. We really want to get the >>>> cpp side bootstrapped as soon as possible. Lets go with what you suggested, >>>> to have contributors review one another's patches and then ask a committer >>>> for a final review once both contributors reach a consensus. >>>> >>>> If there are issues that are easy to review, maybe some of us other >>>> than Nong can take a look. >>>> >>>> rb >>>> >>>> >>>> On 01/25/2016 02:33 PM, Aliaksei Sandryhaila wrote: >>>> >>>>> Hi Ryan, >>>>> >>>>> This sounds very reasonable. I do not argue to disregard the standard >>>>> Apache approach to promoting contributors to committers. I am just >>>>> pointing out that without the input from current committers it is hard >>>>> for us to productively contribute to the project. As a consequence, it >>>>> is hard for us demonstrate our fit to become committers in the future. >>>>> This leaves us in a deadlock, which can be resolved either by an >>>>> increased feedback from existing committers or by making us committers >>>>> sooner. >>>>> >>>>> I understand that most committers on the Parquet project are working on >>>>> the Java implementation, so it can be harder for them to review patches >>>>> for parquet-cpp. In this regard, how about the following protocol for >>>>> parquet-cpp pull requests: After contributors review and revise a pull >>>>> request and agree that it is in a good shape, we will ask a designated >>>>> committer to review and commit the pull request. So far we have been >>>>> asking Nong; if there is a better designated committer for parquet-cpp, >>>>> please let us know. >>>>> >>>>> Thank you, >>>>> Aliaksei. >>>>> >>>>> >>>>> On 01/25/2016 04:54 PM, Ryan Blue wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> Sorry about the current backlog on the parquet-cpp side. Most of the >>>>>> current committer base works on the Java implementation so it's either >>>>>> slow or not reliable for us to do those reviews. >>>>>> >>>>>> I think the best way to move forward is to review patches for each >>>>>> other. That will keep those issues progressing, make it easy for >>>>>> committers to validate the commit, and -- most importantly -- to build >>>>>> a trail of contributions that we can look at to vote in new >>>>>> committers. >>>>>> >>>>>> I completely sympathize with the need for committers on the CPP >>>>>> project, but I don't think this will take a long time given the >>>>>> current level of activity. We're really just trying to build >>>>>> confidence that: >>>>>> >>>>>> 1. You produce quality contributions and understand the codebase >>>>>> 2. You give friendly, thoughtful reviews and don't rubber-stamp >>>>>> 3. You defer judgment and ask others when you don't know >>>>>> 4. You respect others and interact professionally >>>>>> >>>>>> I don't think any of those are that hard to demonstrate, but I'd be >>>>>> uncomfortable not validating committers like we normally do. >>>>>> Especially in this situation, where I could easily see the amount of >>>>>> work you guys are doing adding up pretty quickly! >>>>>> >>>>>> Does that sound like a reasonable path forward? >>>>>> >>>>>> rb >>>>>> >>>>>> >>>>>> On 01/25/2016 12:46 PM, Aliaksei Sandryhaila wrote: >>>>>> >>>>>>> Hi Nong and Julien, >>>>>>> >>>>>>> As Wes has pointed out, we have a number of patches for parquet-cpp >>>>>>> outstanding. Wes, Deepak, and I have been reviewing each other's pull >>>>>>> requests. At this point, the patches need to be reviewed and >>>>>>> approved by >>>>>>> Parquet committers in order to be committed to master. >>>>>>> >>>>>>> Unfortunately, there is not much activity on this side of the >>>>>>> project. >>>>>>> The lack of response from current committers is holding us back, and >>>>>>> we >>>>>>> have to repeatedly rebase our batches, merge multiple pull requests >>>>>>> together, and overall step on each others' toes. >>>>>>> >>>>>>> Is it possible to make Wes, Deepak, and me committers on the >>>>>>> project, so >>>>>>> we can contribute to parquet-cpp more efficiently? >>>>>>> >>>>>>> Thanks, >>>>>>> Aliaksei. >>>>>>> >>>>>>> >>>>>>> On 01/23/2016 06:07 PM, Wes McKinney wrote: >>>>>>> >>>>>>>> Folks, >>>>>>>> >>>>>>>> We're working on a pretty solid patch queue. >>>>>>>> >>>>>>>> independent patches >>>>>>>> PARQUET-449: https://github.com/apache/parquet-cpp/pull/21 >>>>>>>> >>>>>>>> interdependent patches (order to apply patches) >>>>>>>> PARQUET-437 (MOSTLY REVIEWED): >>>>>>>> https://github.com/apache/parquet-cpp/pull/19 >>>>>>>> >>>>>>>> PARQUET-418: https://github.com/apache/parquet-cpp/pull/18 >>>>>>>> PARQUET-434: https://github.com/apache/parquet-cpp/pull/20 >>>>>>>> PARQUET-433: https://github.com/apache/parquet-cpp/pull/22 >>>>>>>> PARQUET-451 & PARQUET-453: >>>>>>>> https://github.com/apache/parquet-cpp/pull/23 >>>>>>>> >>>>>>>> PARQUET-428 (needs to be rebased on top of PARQUET-433): >>>>>>>> https://github.com/apache/parquet-cpp/pull/24 >>>>>>>> >>>>>>>> I'm going to take a breather and work on some other things this >>>>>>>> weekend, >>>>>>>> but I'll be available for code reviews and fixes to try to move >>>>>>>> along >>>>>>>> this >>>>>>>> patch queue. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Wes >>>>>>>> >>>>>>>> On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Great to meet you all! >>>>>>>>> >>>>>>>>> I've recently been collaborating with the Apache Drill team to spin >>>>>>>>> out >>>>>>>>> the ValueVector columnar in-memory data structure into a new >>>>>>>>> standalone >>>>>>>>> project that will be called Arrow [1] [2]. A brief summary of >>>>>>>>> Arrow/ValueVectors is that it permits O(1) random access on nested >>>>>>>>> columnar >>>>>>>>> structures and is efficient for projections and scans in a columnar >>>>>>>>> SQL >>>>>>>>> setting. >>>>>>>>> >>>>>>>>> I'm very interested in making Parquet read/write support available >>>>>>>>> to >>>>>>>>> Python programmers via C/C++ extensions, so I'm going to be working >>>>>>>>> the >>>>>>>>> next few months on a Parquet->Arrow->Python toolchain, along with >>>>>>>>> some >>>>>>>>> tools to manipulate tables in-memory columnar data in the style of >>>>>>>>> Python's >>>>>>>>> pandas library. >>>>>>>>> >>>>>>>>> I will propose patches as needed to parquet-cpp to improve its >>>>>>>>> performance >>>>>>>>> and add functionality for writing Parquet files as well. The >>>>>>>>> details of >>>>>>>>> converting to/from Parquet's repetition/definition level >>>>>>>>> representation of >>>>>>>>> nested data will stay separate in the arrow-parquet adapter code. >>>>>>>>> >>>>>>>>> cheers, >>>>>>>>> Wes >>>>>>>>> >>>>>>>>> [1]: >>>>>>>>> >>>>>>>>> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E >>>>>>>>> >>>>>>>>> >>>>>>>>> [2]: >>>>>>>>> >>>>>>>>> http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour < >>>>>>>>> [email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I'm very interested in this subject because I would like to export >>>>>>>>>> parquet data from HDFS to Vertica (using VSQL). >>>>>>>>>> I'm planning to work on it next quarter, but I will be very happy >>>>>>>>>> to >>>>>>>>>> help >>>>>>>>>> you on this subject (review, testing). >>>>>>>>>> >>>>>>>>>> Have a nice day, >>>>>>>>>> -- >>>>>>>>>> Mickaël Lacour >>>>>>>>>> Senior Software Engineer >>>>>>>>>> Analytics Infrastructure team @Scalability >>>>>>>>>> >>>>>>>>>> ________________________________________ >>>>>>>>>> From: Walkauskas, Stephen Gregory (Vertica) >>>>>>>>>> <[email protected]> >>>>>>>>>> Sent: Thursday, January 14, 2016 3:23 PM >>>>>>>>>> To: Sandryhaila, Aliaksei; [email protected]; Majeti, >>>>>>>>>> Deepak; >>>>>>>>>> [email protected]; Wes McKinney >>>>>>>>>> Subject: Re: Parquet-cpp >>>>>>>>>> >>>>>>>>>> Yes, thanks for the introduction Julien. >>>>>>>>>> >>>>>>>>>> Nong and Wes, >>>>>>>>>> >>>>>>>>>> It'd be interesting to know your goals for parquet-cpp. >>>>>>>>>> >>>>>>>>>> The Vertica database already supports optimized reads of ORC files >>>>>>>>>> (fast >>>>>>>>>> c++ parser, predicate pushdown, columns selection etc). We'd like >>>>>>>>>> to do >>>>>>>>>> the same for parquet. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Stephen >>>>>>>>>> >>>>>>>>>> On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote: >>>>>>>>>> >>>>>>>>>>> Thank you for the introduction, Julien! >>>>>>>>>>> >>>>>>>>>>> Hello Nong and Wes, >>>>>>>>>>> >>>>>>>>>>> Stephen, Deepak and I are developing a C++ library to support >>>>>>>>>>> Parquet in >>>>>>>>>>> Vertica RDBMS. We are using Parquet-cpp as a starting point and >>>>>>>>>>> are >>>>>>>>>>> expanding its functionality as well as improving it and fixing >>>>>>>>>>> bugs. We >>>>>>>>>>> would like to contribute these improvements back to the >>>>>>>>>>> open-source >>>>>>>>>>> community. We plan to do this through the usual process of >>>>>>>>>>> creating >>>>>>>>>>> jiras that justify and explain a code change, and then submitting >>>>>>>>>>> pull >>>>>>>>>>> requests. We look forward to working with you on Parquet-cpp and >>>>>>>>>>> to >>>>>>>>>>> your >>>>>>>>>>> feedback and suggestions. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> Aliaksei. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 01/13/2016 02:54 PM, Julien Le Dem wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello Nong, Wes, Stephen, Deepak and Aliaksei >>>>>>>>>>>> I wanted to introduce you to each other as you are all looking >>>>>>>>>>>> at >>>>>>>>>>>> Parquet-cpp. >>>>>>>>>>>> >>>>>>>>>>>> I'd recommend opening JIRAs in the parquet-cpp component to >>>>>>>>>>>> >>>>>>>>>>> collaborate (I >>>>>>>>>> >>>>>>>>>>> see you already doing this): >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Nong is a committer and can merged pull requests (he also >>>>>>>>>>>> understands >>>>>>>>>>>> >>>>>>>>>>> that >>>>>>>>>> >>>>>>>>>>> code base very well). >>>>>>>>>>>> Other committer can too, feel free to ping us if you need help >>>>>>>>>>>> Obviously, you don't need to be a committer to give others >>>>>>>>>>>> reviews >>>>>>>>>>>> (you >>>>>>>>>>>> just need one to approve and merge). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Cloudera, Inc. >>>> >>> >>> >>> >>> -- >>> Julien >>> >> >> >> >> -- >> Julien >> > >
