Aliaksei, thanks for being understanding here.

I agree with you that it is too difficult. We really want to get the cpp side bootstrapped as soon as possible. Lets go with what you suggested, to have contributors review one another's patches and then ask a committer for a final review once both contributors reach a consensus.

If there are issues that are easy to review, maybe some of us other than Nong can take a look.

rb

On 01/25/2016 02:33 PM, Aliaksei Sandryhaila wrote:
Hi Ryan,

This sounds very reasonable. I do not argue to disregard the standard
Apache approach to promoting contributors to committers. I am just
pointing out that without the input from current committers it is hard
for us to productively contribute to the project. As a consequence, it
is hard for us demonstrate our fit to become committers in the future.
This leaves us in a deadlock, which can be resolved either by an
increased feedback from existing committers or by making us committers
sooner.

I understand that most committers on the Parquet project are working on
the Java implementation, so it can be harder for them to review patches
for parquet-cpp. In this regard, how about the following protocol for
parquet-cpp pull requests: After contributors review and revise a pull
request and agree that it is in a good shape, we will ask a designated
committer to review and commit the pull request. So far we have been
asking Nong; if there is a better designated committer for parquet-cpp,
please let us know.

Thank you,
Aliaksei.


On 01/25/2016 04:54 PM, Ryan Blue wrote:
Hi everyone,

Sorry about the current backlog on the parquet-cpp side. Most of the
current committer base works on the Java implementation so it's either
slow or not reliable for us to do those reviews.

I think the best way to move forward is to review patches for each
other. That will keep those issues progressing, make it easy for
committers to validate the commit, and -- most importantly -- to build
a trail of contributions that we can look at to vote in new committers.

I completely sympathize with the need for committers on the CPP
project, but I don't think this will take a long time given the
current level of activity. We're really just trying to build
confidence that:

1. You produce quality contributions and understand the codebase
2. You give friendly, thoughtful reviews and don't rubber-stamp
3. You defer judgment and ask others when you don't know
4. You respect others and interact professionally

I don't think any of those are that hard to demonstrate, but I'd be
uncomfortable not validating committers like we normally do.
Especially in this situation, where I could easily see the amount of
work you guys are doing adding up pretty quickly!

Does that sound like a reasonable path forward?

rb


On 01/25/2016 12:46 PM, Aliaksei Sandryhaila wrote:
Hi Nong and Julien,

As Wes has pointed out, we have a number of patches for parquet-cpp
outstanding. Wes, Deepak, and I have been reviewing each other's pull
requests. At this point, the patches need to be reviewed and approved by
Parquet committers in order to be committed to master.

Unfortunately, there is not much activity on this side of the project.
The lack of response from current committers is holding us back, and we
have to repeatedly rebase our batches, merge multiple pull requests
together, and overall step on each others' toes.

Is it possible to make Wes, Deepak, and me committers on the project, so
we can contribute to parquet-cpp more efficiently?

Thanks,
Aliaksei.


On 01/23/2016 06:07 PM, Wes McKinney wrote:
Folks,

We're working on a pretty solid patch queue.

independent patches
PARQUET-449: https://github.com/apache/parquet-cpp/pull/21

interdependent patches (order to apply patches)
PARQUET-437 (MOSTLY REVIEWED):
https://github.com/apache/parquet-cpp/pull/19

PARQUET-418: https://github.com/apache/parquet-cpp/pull/18
PARQUET-434: https://github.com/apache/parquet-cpp/pull/20
PARQUET-433: https://github.com/apache/parquet-cpp/pull/22
PARQUET-451 & PARQUET-453:
https://github.com/apache/parquet-cpp/pull/23

PARQUET-428 (needs to be rebased on top of PARQUET-433):
https://github.com/apache/parquet-cpp/pull/24

I'm going to take a breather and work on some other things this
weekend,
but I'll be available for code reviews and fixes to try to move along
this
patch queue.

Thanks,
Wes

On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <w...@cloudera.com> wrote:

Great to meet you all!

I've recently been collaborating with the Apache Drill team to spin
out
the ValueVector columnar in-memory data structure into a new
standalone
project that will be called Arrow [1] [2]. A brief summary of
Arrow/ValueVectors is that it permits O(1) random access on nested
columnar
structures and is efficient for projections and scans in a columnar
SQL
setting.

I'm very interested in making Parquet read/write support available to
Python programmers via C/C++ extensions, so I'm going to be working
the
next few months on a Parquet->Arrow->Python toolchain, along with some
tools to manipulate tables in-memory columnar data in the style of
Python's
pandas library.

I will propose patches as needed to parquet-cpp to improve its
performance
and add functionality for writing Parquet files as well. The
details of
converting to/from Parquet's repetition/definition level
representation of
nested data will stay separate in the arrow-parquet adapter code.

cheers,
Wes

[1]:
http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E


[2]:
http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490


On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <m.lac...@criteo.com>
wrote:

Hi,

I'm very interested in this subject because I would like to export
parquet data from HDFS to Vertica (using VSQL).
I'm planning to work on it next quarter, but I will be very happy to
help
you on this subject (review, testing).

Have a nice day,
--
Mickaël Lacour
Senior Software Engineer
Analytics Infrastructure team @Scalability

________________________________________
From: Walkauskas, Stephen Gregory (Vertica)
<stephen.walkaus...@hpe.com>
Sent: Thursday, January 14, 2016 3:23 PM
To: Sandryhaila, Aliaksei; dev@parquet.apache.org; Majeti, Deepak;
non...@gmail.com; Wes McKinney
Subject: Re: Parquet-cpp

Yes, thanks for the introduction Julien.

Nong and Wes,

It'd be interesting to know your goals for parquet-cpp.

The Vertica database already supports optimized reads of ORC files
(fast
c++ parser, predicate pushdown, columns selection etc). We'd like
to do
the same for parquet.

Cheers,
Stephen

On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote:
Thank you for the introduction, Julien!

Hello Nong and Wes,

Stephen, Deepak and I are developing a C++ library to support
Parquet in
Vertica RDBMS. We are using Parquet-cpp as a starting point and are
expanding its functionality as well as improving it and fixing
bugs. We
would like to contribute these improvements back to the open-source
community. We plan to do this through the usual process of creating
jiras that justify and explain a code change, and then submitting
pull
requests. We look forward to working with you on Parquet-cpp and to
your
feedback and suggestions.

Best regards,
Aliaksei.


On 01/13/2016 02:54 PM, Julien Le Dem wrote:
Hello Nong, Wes, Stephen, Deepak and Aliaksei
I wanted to introduce you to each other as you are all looking at
Parquet-cpp.

I'd recommend opening JIRAs in the parquet-cpp component to
collaborate (I
see you already doing this):

https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp


Nong is a committer and can merged pull requests (he also
understands
that
code base very well).
Other committer can too, feel free to ping us if you need help
Obviously, you don't need to be a committer to give others reviews
(you
just need one to approve and merge).








--
Ryan Blue
Software Engineer
Cloudera, Inc.

Reply via email to