Using Parquet to convert and store JSON documents

James Pirz Tue, 11 Oct 2016 17:31:58 -0700

Hello,

I am using C++ and I need to convert a corpus of JSON documents, whose
schema is not fixed/known in advance, into Parquet format for efficient
processing/storage. I have gone through a number of examples and test-cases
to get an idea about the best way to do it, however I am still confused. I
believe I need to use ParquetWriter and ParquetReader and I am basically
trying to understand:


1- Is it really a requirement to use Avro, Thrift or Protobuf for this
purpose (all the examples seem to use one of them) ? I know the schema info
needs to be stored in the footer of Parquet files, but does it mean that I
need to know the schema ahead of time and do I have to use one of those 3
to store an in-memory representation of my objects or Can I directly feed
the parsed JSON docs into a ParquetWriter ? (Using Avro, Thrift or Protobuf
creates extra dependency which I am really trying to avoid).

2- Almost all the examples I found are described in Java. I am using C++
and I am really looking for an example in that context. I have looked at
a couple of test-cases under parquet-cpp repo, however I am just wondering
if a succinct example is available in C++ to get an idea for such a
conversion.

Any hint or suggestion would be highly appreciated.

Thnx.
James

Using Parquet to convert and store JSON documents

Reply via email to