Hi Wes

We have a rough implementation which does this conversion from (currently)
rapidjson to parquet that we could contribute.
It will need a shepherd/guide to ensure it aligns with the parquet-cpp
implementation standards.

Does the class structure in parquet-cpp have to be in one-to-one
correspondence with the parquet-mr ?

I noticed that parquet-mr Record Conversion API has abstract classes like
WriteSupport, ReadSupport,
PrimitiveConverter, GroupConverter, RecordMaterializer, ParquetInputFormat,
ParquetOutputFormat
which have to be implemented.   I saw that these classes are currently
defined by avro, thrift and protobuf
converters (e.g.
https://github.com/apache/parquet-mr/tree/master/parquet-avro/src/main/java/org/apache/parquet/avro
)

Would the parquet-cpp framework require the exact same framework ?

-Sandeep

On Thu, Nov 2, 2017 at 8:27 PM, Wes McKinney <[email protected]> wrote:

> hi Sandeep,
>
> This is more than welcome to be implemented, though I personally have
> no need for it (almost exclusively work with columnar data / Arrow).
> In addition to implementing the decoding to records, we would need to
> define a suitable record data structure in C++ which is decent amount
> of work.
>
> - Wes
>
> On Thu, Nov 2, 2017 at 3:38 AM, Sandeep Joshi <[email protected]> wrote:
> > The parquet-mr version has the Record Conversion API (RecordMaterializer,
> > RecordConsumer) which
> > can be used to convert to and from rows/tuples into the Parquet columnar
> > format.
> >
> > https://github.com/apache/parquet-mr/tree/master/
> parquet-column/src/main/java/org/apache/parquet/io/api
> >
> > Are there any plans to add the same functionality to the parquet-cpp
> > codebase ?
> >
> > I checked the JIRA and couldn't find any outstanding issue although the
> > github README
> > does say  "The 3rd layer would handle reading/writing records."
> > https://github.com/apache/parquet-cpp/blob/master/README.md/
> >
> > -Sandeep
>

Reply via email to