Hi Wes We have a rough implementation which does this conversion from (currently) rapidjson to parquet that we could contribute. It will need a shepherd/guide to ensure it aligns with the parquet-cpp implementation standards.
Does the class structure in parquet-cpp have to be in one-to-one correspondence with the parquet-mr ? I noticed that parquet-mr Record Conversion API has abstract classes like WriteSupport, ReadSupport, PrimitiveConverter, GroupConverter, RecordMaterializer, ParquetInputFormat, ParquetOutputFormat which have to be implemented. I saw that these classes are currently defined by avro, thrift and protobuf converters (e.g. https://github.com/apache/parquet-mr/tree/master/parquet-avro/src/main/java/org/apache/parquet/avro ) Would the parquet-cpp framework require the exact same framework ? -Sandeep On Thu, Nov 2, 2017 at 8:27 PM, Wes McKinney <[email protected]> wrote: > hi Sandeep, > > This is more than welcome to be implemented, though I personally have > no need for it (almost exclusively work with columnar data / Arrow). > In addition to implementing the decoding to records, we would need to > define a suitable record data structure in C++ which is decent amount > of work. > > - Wes > > On Thu, Nov 2, 2017 at 3:38 AM, Sandeep Joshi <[email protected]> wrote: > > The parquet-mr version has the Record Conversion API (RecordMaterializer, > > RecordConsumer) which > > can be used to convert to and from rows/tuples into the Parquet columnar > > format. > > > > https://github.com/apache/parquet-mr/tree/master/ > parquet-column/src/main/java/org/apache/parquet/io/api > > > > Are there any plans to add the same functionality to the parquet-cpp > > codebase ? > > > > I checked the JIRA and couldn't find any outstanding issue although the > > github README > > does say "The 3rd layer would handle reading/writing records." > > https://github.com/apache/parquet-cpp/blob/master/README.md/ > > > > -Sandeep >
