Also, things have been made much worse by Travis CI continuing to have
infrastructure problems. The ASF build queue on Travis CI had completely
stalled by this morning so that no builds were completing; fortunately
their support is quite responsible and they've resolved the queue
blockage, so builds are executing again.
On Tue, Jan 26, 2016 at 4:00 PM, Wes McKinney <[email protected]
<mailto:[email protected]>> wrote:
There's 3 more patches outstanding that are causing blockage (418,
433, and 451/453), so I think if we get them merged today or
tomorrow when we should be able to proceed with some parallel
efforts without quite as much conflict.
On Tue, Jan 26, 2016 at 3:56 PM, Nong Li <[email protected]
<mailto:[email protected]>> wrote:
I'm going to try to more active this week but I admittedly don't
have a lot of
time to work on this. I understand we need to get critical mass
in committers,
code, etc to keep this going but I think we're making good progress.
On Tue, Jan 26, 2016 at 3:27 PM, Julien Le Dem
<[email protected] <mailto:[email protected]>> wrote:
Also as Nong mentioned, PRs should be prefixed by the jira
id followed by a ":" as follows "PARQUET-X: description"
that's just to have the reference in the git changelog. The
merge script enforces it.
On Tue, Jan 26, 2016 at 3:24 PM, Julien Le Dem
<[email protected] <mailto:[email protected]>> wrote:
I'm happy too with Aliaksei, Deepak, Wes, etc reviewing
each other.
I see Nong (who's a committer) has been doing some
reviews already.
When you guys reach a consensus on a PR and want it
merged please mention it in the PR (+1, LGTM) and
mention us directly (@julienledem, ...) to have it merged.
right now I see that #19 and #21 have been committed
(thanks Nong) but it is not clear to me in what order
the others should be committed.
For example Deepak should comment directly on #22 to
approve it. Right now he mentioned it on another PR.
https://github.com/apache/parquet-cpp/pull/24#issuecomment-174354139
Similarly Wes could confirm on that PR whether it looks
good.
Tomorrow is the Parquet sync up if you want to discuss
further:
https://plus.google.com/u/0/events/cvgi67jmoptmgb1i488re8scbuo
On Mon, Jan 25, 2016 at 4:20 PM, Ryan Blue
<[email protected] <mailto:[email protected]>> wrote:
Aliaksei, thanks for being understanding here.
I agree with you that it is too difficult. We really
want to get the cpp side bootstrapped as soon as
possible. Lets go with what you suggested, to have
contributors review one another's patches and then
ask a committer for a final review once both
contributors reach a consensus.
If there are issues that are easy to review, maybe
some of us other than Nong can take a look.
rb
On 01/25/2016 02:33 PM, Aliaksei Sandryhaila wrote:
Hi Ryan,
This sounds very reasonable. I do not argue to
disregard the standard
Apache approach to promoting contributors to
committers. I am just
pointing out that without the input from current
committers it is hard
for us to productively contribute to the
project. As a consequence, it
is hard for us demonstrate our fit to become
committers in the future.
This leaves us in a deadlock, which can be
resolved either by an
increased feedback from existing committers or
by making us committers
sooner.
I understand that most committers on the Parquet
project are working on
the Java implementation, so it can be harder for
them to review patches
for parquet-cpp. In this regard, how about the
following protocol for
parquet-cpp pull requests: After contributors
review and revise a pull
request and agree that it is in a good shape, we
will ask a designated
committer to review and commit the pull request.
So far we have been
asking Nong; if there is a better designated
committer for parquet-cpp,
please let us know.
Thank you,
Aliaksei.
On 01/25/2016 04:54 PM, Ryan Blue wrote:
Hi everyone,
Sorry about the current backlog on the
parquet-cpp side. Most of the
current committer base works on the Java
implementation so it's either
slow or not reliable for us to do those reviews.
I think the best way to move forward is to
review patches for each
other. That will keep those issues
progressing, make it easy for
committers to validate the commit, and --
most importantly -- to build
a trail of contributions that we can look at
to vote in new committers.
I completely sympathize with the need for
committers on the CPP
project, but I don't think this will take a
long time given the
current level of activity. We're really just
trying to build
confidence that:
1. You produce quality contributions and
understand the codebase
2. You give friendly, thoughtful reviews and
don't rubber-stamp
3. You defer judgment and ask others when
you don't know
4. You respect others and interact
professionally
I don't think any of those are that hard to
demonstrate, but I'd be
uncomfortable not validating committers like
we normally do.
Especially in this situation, where I could
easily see the amount of
work you guys are doing adding up pretty
quickly!
Does that sound like a reasonable path forward?
rb
On 01/25/2016 12:46 PM, Aliaksei Sandryhaila
wrote:
Hi Nong and Julien,
As Wes has pointed out, we have a number
of patches for parquet-cpp
outstanding. Wes, Deepak, and I have
been reviewing each other's pull
requests. At this point, the patches
need to be reviewed and approved by
Parquet committers in order to be
committed to master.
Unfortunately, there is not much
activity on this side of the project.
The lack of response from current
committers is holding us back, and we
have to repeatedly rebase our batches,
merge multiple pull requests
together, and overall step on each
others' toes.
Is it possible to make Wes, Deepak, and
me committers on the project, so
we can contribute to parquet-cpp more
efficiently?
Thanks,
Aliaksei.
On 01/23/2016 06:07 PM, Wes McKinney wrote:
Folks,
We're working on a pretty solid
patch queue.
independent patches
PARQUET-449:
https://github.com/apache/parquet-cpp/pull/21
interdependent patches (order to
apply patches)
PARQUET-437 (MOSTLY REVIEWED):
https://github.com/apache/parquet-cpp/pull/19
PARQUET-418:
https://github.com/apache/parquet-cpp/pull/18
PARQUET-434:
https://github.com/apache/parquet-cpp/pull/20
PARQUET-433:
https://github.com/apache/parquet-cpp/pull/22
PARQUET-451 & PARQUET-453:
https://github.com/apache/parquet-cpp/pull/23
PARQUET-428 (needs to be rebased on
top of PARQUET-433):
https://github.com/apache/parquet-cpp/pull/24
I'm going to take a breather and
work on some other things this
weekend,
but I'll be available for code
reviews and fixes to try to move along
this
patch queue.
Thanks,
Wes
On Fri, Jan 15, 2016 at 8:18 AM, Wes
McKinney <[email protected]
<mailto:[email protected]>> wrote:
Great to meet you all!
I've recently been collaborating
with the Apache Drill team to spin
out
the ValueVector columnar
in-memory data structure into a new
standalone
project that will be called
Arrow [1] [2]. A brief summary of
Arrow/ValueVectors is that it
permits O(1) random access on nested
columnar
structures and is efficient for
projections and scans in a columnar
SQL
setting.
I'm very interested in making
Parquet read/write support
available to
Python programmers via C/C++
extensions, so I'm going to be
working
the
next few months on a
Parquet->Arrow->Python
toolchain, along with some
tools to manipulate tables
in-memory columnar data in the
style of
Python's
pandas library.
I will propose patches as needed
to parquet-cpp to improve its
performance
and add functionality for
writing Parquet files as well. The
details of
converting to/from Parquet's
repetition/definition level
representation of
nested data will stay separate
in the arrow-parquet adapter code.
cheers,
Wes
[1]:
http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E
[2]:
http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490
On Fri, Jan 15, 2016 at 1:22 AM,
Mickaël Lacour
<[email protected]
<mailto:[email protected]>>
wrote:
Hi,
I'm very interested in this
subject because I would like
to export
parquet data from HDFS to
Vertica (using VSQL).
I'm planning to work on it
next quarter, but I will be
very happy to
help
you on this subject (review,
testing).
Have a nice day,
--
Mickaël Lacour
Senior Software Engineer
Analytics Infrastructure
team @Scalability
________________________________________
From: Walkauskas, Stephen
Gregory (Vertica)
<[email protected]
<mailto:[email protected]>>
Sent: Thursday, January 14,
2016 3:23 PM
To: Sandryhaila, Aliaksei;
[email protected]
<mailto:[email protected]>;
Majeti, Deepak;
[email protected]
<mailto:[email protected]>;
Wes McKinney
Subject: Re: Parquet-cpp
Yes, thanks for the
introduction Julien.
Nong and Wes,
It'd be interesting to know
your goals for parquet-cpp.
The Vertica database already
supports optimized reads of
ORC files
(fast
c++ parser, predicate
pushdown, columns selection
etc). We'd like
to do
the same for parquet.
Cheers,
Stephen
On 01/13/2016 05:53 PM,
Sandryhaila, Aliaksei wrote:
Thank you for the
introduction, Julien!
Hello Nong and Wes,
Stephen, Deepak and I
are developing a C++
library to support
Parquet in
Vertica RDBMS. We are
using Parquet-cpp as a
starting point and are
expanding its
functionality as well as
improving it and fixing
bugs. We
would like to contribute
these improvements back
to the open-source
community. We plan to do
this through the usual
process of creating
jiras that justify and
explain a code change,
and then submitting
pull
requests. We look
forward to working with
you on Parquet-cpp and to
your
feedback and suggestions.
Best regards,
Aliaksei.
On 01/13/2016 02:54 PM,
Julien Le Dem wrote:
Hello Nong, Wes,
Stephen, Deepak and
Aliaksei
I wanted to
introduce you to
each other as you
are all looking at
Parquet-cpp.
I'd recommend
opening JIRAs in the
parquet-cpp component to
collaborate (I
see you already
doing this):
https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp
Nong is a committer
and can merged pull
requests (he also
understands
that
code base very well).
Other committer can
too, feel free to
ping us if you need help
Obviously, you don't
need to be a
committer to give
others reviews
(you
just need one to
approve and merge).
--
Ryan Blue
Software Engineer
Cloudera, Inc.
--
Julien
--
Julien