Re: Parquet-cpp

Aliaksei Sandryhaila Mon, 25 Jan 2016 12:46:55 -0800

Hi Nong and Julien,

As Wes has pointed out, we have a number of patches for parquet-cppoutstanding. Wes, Deepak, and I have been reviewing each other's pullrequests. At this point, the patches need to be reviewed and approved byParquet committers in order to be committed to master.

Unfortunately, there is not much activity on this side of the project.The lack of response from current committers is holding us back, and wehave to repeatedly rebase our batches, merge multiple pull requeststogether, and overall step on each others' toes.

Is it possible to make Wes, Deepak, and me committers on the project, sowe can contribute to parquet-cpp more efficiently?


Thanks,
Aliaksei.


On 01/23/2016 06:07 PM, Wes McKinney wrote:

Folks,

We're working on a pretty solid patch queue.

independent patches
PARQUET-449: https://github.com/apache/parquet-cpp/pull/21

interdependent patches (order to apply patches)
PARQUET-437 (MOSTLY REVIEWED): https://github.com/apache/parquet-cpp/pull/19

PARQUET-418: https://github.com/apache/parquet-cpp/pull/18
PARQUET-434: https://github.com/apache/parquet-cpp/pull/20
PARQUET-433: https://github.com/apache/parquet-cpp/pull/22
PARQUET-451 & PARQUET-453: https://github.com/apache/parquet-cpp/pull/23

PARQUET-428 (needs to be rebased on top of PARQUET-433):
https://github.com/apache/parquet-cpp/pull/24

I'm going to take a breather and work on some other things this weekend,
but I'll be available for code reviews and fixes to try to move along this
patch queue.

Thanks,
Wes

On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]> wrote:

Great to meet you all!

I've recently been collaborating with the Apache Drill team to spin out
the ValueVector columnar in-memory data structure into a new standalone
project that will be called Arrow [1] [2]. A brief summary of
Arrow/ValueVectors is that it permits O(1) random access on nested columnar
structures and is efficient for projections and scans in a columnar SQL
setting.

I'm very interested in making Parquet read/write support available to
Python programmers via C/C++ extensions, so I'm going to be working the
next few months on a Parquet->Arrow->Python toolchain, along with some
tools to manipulate tables in-memory columnar data in the style of Python's
pandas library.

I will propose patches as needed to parquet-cpp to improve its performance
and add functionality for writing Parquet files as well. The details of
converting to/from Parquet's repetition/definition level representation of
nested data will stay separate in the arrow-parquet adapter code.

cheers,
Wes

[1]:
http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E
[2]:
http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490

On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <[email protected]>
wrote:

Hi,

I'm very interested in this subject because I would like to export
parquet data from HDFS to Vertica (using VSQL).
I'm planning to work on it next quarter, but I will be very happy to help
you on this subject (review, testing).

Have a nice day,
--
Mickaël Lacour
Senior Software Engineer
Analytics Infrastructure team @Scalability

________________________________________
From: Walkauskas, Stephen Gregory (Vertica) <[email protected]>
Sent: Thursday, January 14, 2016 3:23 PM
To: Sandryhaila, Aliaksei; [email protected]; Majeti, Deepak;
[email protected]; Wes McKinney
Subject: Re: Parquet-cpp

Yes, thanks for the introduction Julien.

Nong and Wes,

It'd be interesting to know your goals for parquet-cpp.

The Vertica database already supports optimized reads of ORC files (fast
c++ parser, predicate pushdown, columns selection etc). We'd like to do
the same for parquet.

Cheers,
Stephen

On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote:

Thank you for the introduction, Julien!

Hello Nong and Wes,

Stephen, Deepak and I are developing a C++ library to support Parquet in
Vertica RDBMS. We are using Parquet-cpp as a starting point and are
expanding its functionality as well as improving it and fixing bugs. We
would like to contribute these improvements back to the open-source
community. We plan to do this through the usual process of creating
jiras that justify and explain a code change, and then submitting pull
requests. We look forward to working with you on Parquet-cpp and to your
feedback and suggestions.

Best regards,
Aliaksei.


On 01/13/2016 02:54 PM, Julien Le Dem wrote:

Hello Nong, Wes, Stephen, Deepak and Aliaksei
I wanted to introduce you to each other as you are all looking at
Parquet-cpp.

I'd recommend opening JIRAs in the parquet-cpp component to

collaborate (I

see you already doing this):

https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp

Nong is a committer and can merged pull requests (he also understands

that

code base very well).
Other committer can too, feel free to ping us if you need help
Obviously, you don't need to be a committer to give others reviews (you
just need one to approve and merge).

Re: Parquet-cpp

Reply via email to