I am happy to help out with the patch maintenance when there are conflicts. With PARQUET-437 we'll want to write more unit tests which will help make sure we aren't breaking each other's code.
On Mon, Jan 25, 2016 at 2:33 PM, Aliaksei Sandryhaila <[email protected]> wrote: > Hi Ryan, > > This sounds very reasonable. I do not argue to disregard the standard > Apache approach to promoting contributors to committers. I am just pointing > out that without the input from current committers it is hard for us to > productively contribute to the project. As a consequence, it is hard for us > demonstrate our fit to become committers in the future. This leaves us in a > deadlock, which can be resolved either by an increased feedback from > existing committers or by making us committers sooner. > > I understand that most committers on the Parquet project are working on > the Java implementation, so it can be harder for them to review patches for > parquet-cpp. In this regard, how about the following protocol for > parquet-cpp pull requests: After contributors review and revise a pull > request and agree that it is in a good shape, we will ask a designated > committer to review and commit the pull request. So far we have been asking > Nong; if there is a better designated committer for parquet-cpp, please let > us know. > > Thank you, > Aliaksei. > > > > On 01/25/2016 04:54 PM, Ryan Blue wrote: > >> Hi everyone, >> >> Sorry about the current backlog on the parquet-cpp side. Most of the >> current committer base works on the Java implementation so it's either slow >> or not reliable for us to do those reviews. >> >> I think the best way to move forward is to review patches for each other. >> That will keep those issues progressing, make it easy for committers to >> validate the commit, and -- most importantly -- to build a trail of >> contributions that we can look at to vote in new committers. >> >> I completely sympathize with the need for committers on the CPP project, >> but I don't think this will take a long time given the current level of >> activity. We're really just trying to build confidence that: >> >> 1. You produce quality contributions and understand the codebase >> 2. You give friendly, thoughtful reviews and don't rubber-stamp >> 3. You defer judgment and ask others when you don't know >> 4. You respect others and interact professionally >> >> I don't think any of those are that hard to demonstrate, but I'd be >> uncomfortable not validating committers like we normally do. Especially in >> this situation, where I could easily see the amount of work you guys are >> doing adding up pretty quickly! >> >> Does that sound like a reasonable path forward? >> >> rb >> >> >> On 01/25/2016 12:46 PM, Aliaksei Sandryhaila wrote: >> >>> Hi Nong and Julien, >>> >>> As Wes has pointed out, we have a number of patches for parquet-cpp >>> outstanding. Wes, Deepak, and I have been reviewing each other's pull >>> requests. At this point, the patches need to be reviewed and approved by >>> Parquet committers in order to be committed to master. >>> >>> Unfortunately, there is not much activity on this side of the project. >>> The lack of response from current committers is holding us back, and we >>> have to repeatedly rebase our batches, merge multiple pull requests >>> together, and overall step on each others' toes. >>> >>> Is it possible to make Wes, Deepak, and me committers on the project, so >>> we can contribute to parquet-cpp more efficiently? >>> >>> Thanks, >>> Aliaksei. >>> >>> >>> On 01/23/2016 06:07 PM, Wes McKinney wrote: >>> >>>> Folks, >>>> >>>> We're working on a pretty solid patch queue. >>>> >>>> independent patches >>>> PARQUET-449: https://github.com/apache/parquet-cpp/pull/21 >>>> >>>> interdependent patches (order to apply patches) >>>> PARQUET-437 (MOSTLY REVIEWED): >>>> https://github.com/apache/parquet-cpp/pull/19 >>>> >>>> PARQUET-418: https://github.com/apache/parquet-cpp/pull/18 >>>> PARQUET-434: https://github.com/apache/parquet-cpp/pull/20 >>>> PARQUET-433: https://github.com/apache/parquet-cpp/pull/22 >>>> PARQUET-451 & PARQUET-453: >>>> https://github.com/apache/parquet-cpp/pull/23 >>>> >>>> PARQUET-428 (needs to be rebased on top of PARQUET-433): >>>> https://github.com/apache/parquet-cpp/pull/24 >>>> >>>> I'm going to take a breather and work on some other things this weekend, >>>> but I'll be available for code reviews and fixes to try to move along >>>> this >>>> patch queue. >>>> >>>> Thanks, >>>> Wes >>>> >>>> On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]> wrote: >>>> >>>> Great to meet you all! >>>>> >>>>> I've recently been collaborating with the Apache Drill team to spin out >>>>> the ValueVector columnar in-memory data structure into a new standalone >>>>> project that will be called Arrow [1] [2]. A brief summary of >>>>> Arrow/ValueVectors is that it permits O(1) random access on nested >>>>> columnar >>>>> structures and is efficient for projections and scans in a columnar SQL >>>>> setting. >>>>> >>>>> I'm very interested in making Parquet read/write support available to >>>>> Python programmers via C/C++ extensions, so I'm going to be working the >>>>> next few months on a Parquet->Arrow->Python toolchain, along with some >>>>> tools to manipulate tables in-memory columnar data in the style of >>>>> Python's >>>>> pandas library. >>>>> >>>>> I will propose patches as needed to parquet-cpp to improve its >>>>> performance >>>>> and add functionality for writing Parquet files as well. The details of >>>>> converting to/from Parquet's repetition/definition level >>>>> representation of >>>>> nested data will stay separate in the arrow-parquet adapter code. >>>>> >>>>> cheers, >>>>> Wes >>>>> >>>>> [1]: >>>>> >>>>> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E >>>>> >>>>> [2]: >>>>> >>>>> http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490 >>>>> >>>>> On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <[email protected]> >>>>> wrote: >>>>> >>>>> Hi, >>>>>> >>>>>> I'm very interested in this subject because I would like to export >>>>>> parquet data from HDFS to Vertica (using VSQL). >>>>>> I'm planning to work on it next quarter, but I will be very happy to >>>>>> help >>>>>> you on this subject (review, testing). >>>>>> >>>>>> Have a nice day, >>>>>> -- >>>>>> Mickaël Lacour >>>>>> Senior Software Engineer >>>>>> Analytics Infrastructure team @Scalability >>>>>> >>>>>> ________________________________________ >>>>>> From: Walkauskas, Stephen Gregory (Vertica) >>>>>> <[email protected]> >>>>>> Sent: Thursday, January 14, 2016 3:23 PM >>>>>> To: Sandryhaila, Aliaksei; [email protected]; Majeti, Deepak; >>>>>> [email protected]; Wes McKinney >>>>>> Subject: Re: Parquet-cpp >>>>>> >>>>>> Yes, thanks for the introduction Julien. >>>>>> >>>>>> Nong and Wes, >>>>>> >>>>>> It'd be interesting to know your goals for parquet-cpp. >>>>>> >>>>>> The Vertica database already supports optimized reads of ORC files >>>>>> (fast >>>>>> c++ parser, predicate pushdown, columns selection etc). We'd like to >>>>>> do >>>>>> the same for parquet. >>>>>> >>>>>> Cheers, >>>>>> Stephen >>>>>> >>>>>> On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote: >>>>>> >>>>>>> Thank you for the introduction, Julien! >>>>>>> >>>>>>> Hello Nong and Wes, >>>>>>> >>>>>>> Stephen, Deepak and I are developing a C++ library to support >>>>>>> Parquet in >>>>>>> Vertica RDBMS. We are using Parquet-cpp as a starting point and are >>>>>>> expanding its functionality as well as improving it and fixing >>>>>>> bugs. We >>>>>>> would like to contribute these improvements back to the open-source >>>>>>> community. We plan to do this through the usual process of creating >>>>>>> jiras that justify and explain a code change, and then submitting >>>>>>> pull >>>>>>> requests. We look forward to working with you on Parquet-cpp and to >>>>>>> your >>>>>>> feedback and suggestions. >>>>>>> >>>>>>> Best regards, >>>>>>> Aliaksei. >>>>>>> >>>>>>> >>>>>>> On 01/13/2016 02:54 PM, Julien Le Dem wrote: >>>>>>> >>>>>>>> Hello Nong, Wes, Stephen, Deepak and Aliaksei >>>>>>>> I wanted to introduce you to each other as you are all looking at >>>>>>>> Parquet-cpp. >>>>>>>> >>>>>>>> I'd recommend opening JIRAs in the parquet-cpp component to >>>>>>>> >>>>>>> collaborate (I >>>>>> >>>>>>> see you already doing this): >>>>>>>> >>>>>>>> >>>>>> https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp >>>>>> >>>>>> Nong is a committer and can merged pull requests (he also understands >>>>>>>> >>>>>>> that >>>>>> >>>>>>> code base very well). >>>>>>>> Other committer can too, feel free to ping us if you need help >>>>>>>> Obviously, you don't need to be a committer to give others reviews >>>>>>>> (you >>>>>>>> just need one to approve and merge). >>>>>>>> >>>>>>>> >>>>> >>> >> >> >
