Folks, We're working on a pretty solid patch queue.
independent patches PARQUET-449: https://github.com/apache/parquet-cpp/pull/21 interdependent patches (order to apply patches) PARQUET-437 (MOSTLY REVIEWED): https://github.com/apache/parquet-cpp/pull/19 PARQUET-418: https://github.com/apache/parquet-cpp/pull/18 PARQUET-434: https://github.com/apache/parquet-cpp/pull/20 PARQUET-433: https://github.com/apache/parquet-cpp/pull/22 PARQUET-451 & PARQUET-453: https://github.com/apache/parquet-cpp/pull/23 PARQUET-428 (needs to be rebased on top of PARQUET-433): https://github.com/apache/parquet-cpp/pull/24 I'm going to take a breather and work on some other things this weekend, but I'll be available for code reviews and fixes to try to move along this patch queue. Thanks, Wes On Fri, Jan 15, 2016 at 8:18 AM, Wes McKinney <[email protected]> wrote: > Great to meet you all! > > I've recently been collaborating with the Apache Drill team to spin out > the ValueVector columnar in-memory data structure into a new standalone > project that will be called Arrow [1] [2]. A brief summary of > Arrow/ValueVectors is that it permits O(1) random access on nested columnar > structures and is efficient for projections and scans in a columnar SQL > setting. > > I'm very interested in making Parquet read/write support available to > Python programmers via C/C++ extensions, so I'm going to be working the > next few months on a Parquet->Arrow->Python toolchain, along with some > tools to manipulate tables in-memory columnar data in the style of Python's > pandas library. > > I will propose patches as needed to parquet-cpp to improve its performance > and add functionality for writing Parquet files as well. The details of > converting to/from Parquet's repetition/definition level representation of > nested data will stay separate in the arrow-parquet adapter code. > > cheers, > Wes > > [1]: > http://mail-archives.apache.org/mod_mbox/drill-dev/201510.mbox/%3CCAJrw0OSVoirU_EUrBBqKY12uDi_f8U9MP7J_6Puuh_DmcyzS9g%40mail.gmail.com%3E > [2]: > http://permalink.gmane.org/gmane.comp.apache.incubator.drill.devel/16490 > > On Fri, Jan 15, 2016 at 1:22 AM, Mickaël Lacour <[email protected]> > wrote: > >> Hi, >> >> I'm very interested in this subject because I would like to export >> parquet data from HDFS to Vertica (using VSQL). >> I'm planning to work on it next quarter, but I will be very happy to help >> you on this subject (review, testing). >> >> Have a nice day, >> -- >> Mickaël Lacour >> Senior Software Engineer >> Analytics Infrastructure team @Scalability >> >> ________________________________________ >> From: Walkauskas, Stephen Gregory (Vertica) <[email protected]> >> Sent: Thursday, January 14, 2016 3:23 PM >> To: Sandryhaila, Aliaksei; [email protected]; Majeti, Deepak; >> [email protected]; Wes McKinney >> Subject: Re: Parquet-cpp >> >> Yes, thanks for the introduction Julien. >> >> Nong and Wes, >> >> It'd be interesting to know your goals for parquet-cpp. >> >> The Vertica database already supports optimized reads of ORC files (fast >> c++ parser, predicate pushdown, columns selection etc). We'd like to do >> the same for parquet. >> >> Cheers, >> Stephen >> >> On 01/13/2016 05:53 PM, Sandryhaila, Aliaksei wrote: >> > Thank you for the introduction, Julien! >> > >> > Hello Nong and Wes, >> > >> > Stephen, Deepak and I are developing a C++ library to support Parquet in >> > Vertica RDBMS. We are using Parquet-cpp as a starting point and are >> > expanding its functionality as well as improving it and fixing bugs. We >> > would like to contribute these improvements back to the open-source >> > community. We plan to do this through the usual process of creating >> > jiras that justify and explain a code change, and then submitting pull >> > requests. We look forward to working with you on Parquet-cpp and to your >> > feedback and suggestions. >> > >> > Best regards, >> > Aliaksei. >> > >> > >> > On 01/13/2016 02:54 PM, Julien Le Dem wrote: >> >> Hello Nong, Wes, Stephen, Deepak and Aliaksei >> >> I wanted to introduce you to each other as you are all looking at >> >> Parquet-cpp. >> >> >> >> I'd recommend opening JIRAs in the parquet-cpp component to >> collaborate (I >> >> see you already doing this): >> >> >> https://issues.apache.org/jira/browse/PARQUET-418?jql=project%20%3D%20PARQUET%20AND%20component%20%3D%20parquet-cpp >> >> >> >> Nong is a committer and can merged pull requests (he also understands >> that >> >> code base very well). >> >> Other committer can too, feel free to ping us if you need help >> >> Obviously, you don't need to be a committer to give others reviews (you >> >> just need one to approve and merge). >> >> >> > >> > >
