I’m CC’ing the parquet-cpp main contributors. You are right that there has been 
a lot of progress recently.
(some of the goals are to provide Python and Vertica access to Parquet files)
I’ll let them comment on the progress.
You’re certainly welcome to contribute.
Julien

> On Aug 3, 2016, at 8:00 AM, Jim Pivarski <[email protected]> wrote:
> 
> Hi,
> 
> I'd like to use parquet-cpp for High Energy Physics (HEP) and possibly
> contribute to the core to support that use-case, but I'm having trouble
> determining the status of the C++ project.
> 
> Most HEP data is stored in the ROOT file format (
> https://root.cern.ch/root/InputOutput.html), which represents complex,
> nested, cross-referenced C++ objects with a columnar layout so that a
> subset of fields can be individually read, individually compressed, and
> quickly scanned. I believe that these benefits can be satisfied by Parquet,
> with the additional benefit that it's a standard with a specification that
> can be read or written in multiple languages. (Parquet can't be used as a
> random-writable object database, but this feature of ROOT isn't widely
> used.)
> 
> To convert between ROOT and Parquet, I would need to implement ROOT's
> "StreamerInfo" object schema (https://root.cern.ch/root/SchemaEvolution.html)
> into a Logical Type Definition, on par with AvroRecordReader, but also
> supporting pointer references (as an Int64 -> object map).
> 
> Parquet C++'s TODO (https://github.com/apache/parquet-cpp/blob/master/TODO)
> states that this record abstraction, as well as nested schemas and
> file-writing, haven't been implemented. However, the TODO is also 2 years
> old, where I see a burst of activity this year in GitHub. Is the TODO out
> of date?
> 
> Will any of the core developers be at KDD16 (http://www.kdd.org/kdd2016/)
> or elsewhere in San Francisco on August 15 or 16? If so, could we meet in
> person so that we can talk in detail about where the hooks I'm looking for
> are and how I can contribute? (Or *when* I should contribute, if there's a
> major refactoring in the works.)
> 
> Thanks!
> -- Jim

Reply via email to