I am happy to help out with the patch maintenance when there are conflicts.
With PARQUET-437 we'll want to write more unit tests which will help make
sure we aren't breaking each other's code.

On Mon, Jan 25, 2016 at 2:33 PM, Aliaksei Sandryhaila <[email protected]>
wrote:

> Hi Ryan,
>
> This sounds very reasonable. I do not argue to disregard the standard
> Apache approach to promoting contributors to committers. I am just pointing
> out that without the input from current committers it is hard for us to
> productively contribute to the project. As a consequence, it is hard for us
> demonstrate our fit to become committers in the future. This leaves us in a
> deadlock, which can be resolved either by an increased feedback from
> existing committers or by making us committers sooner.
>
> I understand that most committers on the Parquet project are working on
> the Java implementation, so it can be harder for them to review patches for
> parquet-cpp. In this regard, how about the following protocol for
> parquet-cpp pull requests: After contributors review and revise a pull
> request and agree that it is in a good shape, we will ask a designated
> committer to review and commit the pull request. So far we have been asking
> Nong; if there is a better designated committer for parquet-cpp, please let
> us know.
>
> Thank you,
> Aliaksei.
>
>
>
> On 01/25/2016 04:54 PM, Ryan Blue wrote:
>
>> Hi everyone,
>>
>> Sorry about the current backlog on the parquet-cpp side. Most of the
>> current committer base works on the Java implementation so it's either slow
>> or not reliable for us to do those reviews.
>>
>> I think the best way to move forward is to review patches for each other.
>> That will keep those issues progressing, make it easy for committers to
>> validate the commit, and -- most importantly -- to build a trail of
>> contributions that we can look at to vote in new committers.
>>
>> I completely sympathize with the need for committers on the CPP project,
>> but I don't think this will take a long time given the current level of
>> activity. We're really just trying to build confidence that:
>>
>> 1. You produce quality contributions and understand the codebase
>> 2. You give friendly, thoughtful reviews and don't rubber-stamp
>> 3. You defer judgment and ask others when you don't know
>> 4. You respect others and interact professionally
>>
>> I don't think any of those are that hard to demonstrate, but I'd be
>> uncomfortable not validating committers like we normally do. Especially in
>> this situation, where I could easily see the amount of work you guys are
>> doing adding up pretty quickly!
>>
>> Does that sound like a reasonable path forward?
>>
>> rb
>>
>>
>> On 01/25/2016 12:46 PM, Aliaksei Sandryhaila wrote:
>>
>>> Hi Nong and Julien,
>>>
>>> As Wes has pointed out, we have a number of patches for parquet-cpp
>>> outstanding. Wes, Deepak, and I have been reviewing each other's pull
>>> requests. At this point, the patches need to be reviewed and approved by
>>> Parquet committers in order to be committed to master.
>>>
>>> Unfortunately, there is not much activity on this side of the project.
>>> The lack of response from current committers is holding us back, and we
>>> have to repeatedly rebase our batches, merge multiple pull requests
>>> together, and overall step on each others' toes.
>>>
>>> Is it possible to make Wes, Deepak, and me committers on the project, so
>>> we can contribute to parquet-cpp more efficiently?
>>>
>>> Thanks,
>>> Aliaksei.
>>>
>>>
>>> On 01/23/2016 06:07 PM, Wes McKinney wrote:
>>>
>>>> Folks,
>>>>
>>>> We're working on a pretty solid patch queue.
>>>>
>>>> independent patches
>>>> PARQUET-449: https://github.com/apache/parquet-cpp/pull/21
>>>>
>>>> interdependent patches (order to apply patches)
>>>> PARQUET-437 (MOSTLY REVIEWED):
>>>> https://github.com/apache/parquet-cpp/pull/19
>>>>
>>>> PARQUET-418: https://github.com/apache/parquet-cpp/pull/18
>>>> PARQUET-434: https://github.com/apache/parquet-cpp/pull/20
>>>> PARQUET-433: https://github.com/apache/parquet-cpp/pull/22
>>>> PARQUET-451 & PARQUET-453:
>>>> https://github.com/apache/parquet-cpp/pull/23
>>>>
>>>> PARQUET-428 (needs to be rebased on top of PARQUET-433):
>>>> https://github.com/apache/parquet-cpp/pull/24
>>>>
>>>> I'm going to take a breather and work on some other things this weekend,
>>>> but I'll be available for code reviews and fixes to try to move along
>>>> this
>>>> patch queue.
>>>>
>>>> Thanks,
>>>> Wes
>>>>
>>>> On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]> wrote:
>>>>
>>>> Great to meet you all!
>>>>>
>>>>> I've recently been collaborating with the Apache Drill team to spin out
>>>>> the ValueVector columnar in-memory data structure into a new standalone
>>>>> project that will be called Arrow [1] [2]. A brief summary of
>>>>> Arrow/ValueVectors is that it permits O(1) random access on nested
>>>>> columnar
>>>>> structures and is efficient for projections and scans in a columnar SQL
>>>>> setting.
>>>>>
>>>>> I'm very interested in making Parquet read/write support available to
>>>>> Python programmers via C/C++ extensions, so I'm going to be working the
>>>>> next few months on a Parquet->Arrow->Python toolchain, along with some
>>>>> tools to manipulate tables in-memory columnar data in the style of
>>>>> Python's
>>>>> pandas library.
>>>>>
>>>>> I will propose patches as needed to parquet-cpp to improve its
>>>>> performance
>>>>> and add functionality for writing Parquet files as well. The details of
>>>>> converting to/from Parquet's repetition/definition level
>>>>> representation of
>>>>> nested data will stay separate in the arrow-parquet adapter code.
>>>>>
>>>>> cheers,
>>>>> Wes
>>>>>
>>>>> [1]:
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E
>>>>>
>>>>> [2]:
>>>>>
>>>>> http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490
>>>>>
>>>>> On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> I'm very interested in this subject because I would like to export
>>>>>> parquet data from HDFS to Vertica (using VSQL).
>>>>>> I'm planning to work on it next quarter, but I will be very happy to
>>>>>> help
>>>>>> you on this subject (review, testing).
>>>>>>
>>>>>> Have a nice day,
>>>>>> --
>>>>>> Mickaël Lacour
>>>>>> Senior Software Engineer
>>>>>> Analytics Infrastructure team @Scalability
>>>>>>
>>>>>> ________________________________________
>>>>>> From: Walkauskas, Stephen Gregory (Vertica)
>>>>>> <[email protected]>
>>>>>> Sent: Thursday, January 14, 2016 3:23 PM
>>>>>> To: Sandryhaila, Aliaksei; [email protected]; Majeti, Deepak;
>>>>>> [email protected]; Wes McKinney
>>>>>> Subject: Re: Parquet-cpp
>>>>>>
>>>>>> Yes, thanks for the introduction Julien.
>>>>>>
>>>>>> Nong and Wes,
>>>>>>
>>>>>> It'd be interesting to know your goals for parquet-cpp.
>>>>>>
>>>>>> The Vertica database already supports optimized reads of ORC files
>>>>>> (fast
>>>>>> c++ parser, predicate pushdown, columns selection etc). We'd like to
>>>>>> do
>>>>>> the same for parquet.
>>>>>>
>>>>>> Cheers,
>>>>>> Stephen
>>>>>>
>>>>>> On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote:
>>>>>>
>>>>>>> Thank you for the introduction, Julien!
>>>>>>>
>>>>>>> Hello Nong and Wes,
>>>>>>>
>>>>>>> Stephen, Deepak and I are developing a C++ library to support
>>>>>>> Parquet in
>>>>>>> Vertica RDBMS. We are using Parquet-cpp as a starting point and are
>>>>>>> expanding its functionality as well as improving it and fixing
>>>>>>> bugs. We
>>>>>>> would like to contribute these improvements back to the open-source
>>>>>>> community. We plan to do this through the usual process of creating
>>>>>>> jiras that justify and explain a code change, and then submitting
>>>>>>> pull
>>>>>>> requests. We look forward to working with you on Parquet-cpp and to
>>>>>>> your
>>>>>>> feedback and suggestions.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Aliaksei.
>>>>>>>
>>>>>>>
>>>>>>> On 01/13/2016 02:54 PM, Julien Le Dem wrote:
>>>>>>>
>>>>>>>> Hello Nong, Wes, Stephen, Deepak and Aliaksei
>>>>>>>> I wanted to introduce you to each other as you are all looking at
>>>>>>>> Parquet-cpp.
>>>>>>>>
>>>>>>>> I'd recommend opening JIRAs in the parquet-cpp component to
>>>>>>>>
>>>>>>> collaborate (I
>>>>>>
>>>>>>> see you already doing this):
>>>>>>>>
>>>>>>>>
>>>>>> https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp
>>>>>>
>>>>>> Nong is a committer and can merged pull requests (he also understands
>>>>>>>>
>>>>>>> that
>>>>>>
>>>>>>> code base very well).
>>>>>>>> Other committer can too, feel free to ping us if you need help
>>>>>>>> Obviously, you don't need to be a committer to give others reviews
>>>>>>>> (you
>>>>>>>> just need one to approve and merge).
>>>>>>>>
>>>>>>>>
>>>>>
>>>
>>
>>
>

Reply via email to