hi Jim Cool to hear about this use case. My gut feeling is that we should not expand the scope of the parquet-cpp library itself too much beyond the computational details of constructing the encoded streams / metadata and writing to a file stream or decoding a file into the raw values stored in each column.
We could potentially create adapter code to convert between Parquet raw (arrays of data page values, repetition, and definition levels) and Avro/Protobuf data structures. What we've done in Arrow, since we will need a generic IO subsystem for many tasks (for interacting with HDFS or other blob stores), is put all of this in leaf libraries in apache/arrow (see arrow::io and arrow::parquet namespaces). There isn't really the equivalent of a Boost for C++ Apache projects, so arrow::io seemed like a fine place to put them. I'm getting back to SF from an international trip on the 16th but I can meet with you in the later part of the day, and anyone else is welcome to join to discuss. - Wes On Wed, Aug 3, 2016 at 10:04 AM, Julien Le Dem <[email protected]> wrote: > Yes that would be another way to do it. > The Parquet-cpp/parquet-arrow integration/arrow cpp efforts are closely > related. > Julien > >> On Aug 3, 2016, at 9:41 AM, Jim Pivarski <[email protected]> wrote: >> >> Related question: could I get ROOT's complex events into Parquet files >> without inventing a Logical Type Definition by converting them to Apache >> Arrow data structures in memory, and then letting the Arrow-Parquet >> integration write those data structures to files? >> >> Arrow could provide side-benefits, such as sharing data between ROOT's C++ >> framework and JVM-based applications without intermediate files through the >> JNI. (Two birds with one stone.) >> >> -- Jim >
