Hello, I am using C++ and I need to convert a corpus of JSON documents, whose schema is not fixed/known in advance, into Parquet format for efficient processing/storage. I have gone through a number of examples and test-cases to get an idea about the best way to do it, however I am still confused. I believe I need to use ParquetWriter and ParquetReader and I am basically trying to understand:
1- Is it really a requirement to use Avro, Thrift or Protobuf for this purpose (all the examples seem to use one of them) ? I know the schema info needs to be stored in the footer of Parquet files, but does it mean that I need to know the schema ahead of time and do I have to use one of those 3 to store an in-memory representation of my objects or Can I directly feed the parsed JSON docs into a ParquetWriter ? (Using Avro, Thrift or Protobuf creates extra dependency which I am really trying to avoid). 2- Almost all the examples I found are described in Java. I am using C++ and I am really looking for an example in that context. I have looked at a couple of test-cases under parquet-cpp repo, however I am just wondering if a succinct example is available in C++ to get an idea for such a conversion. Any hint or suggestion would be highly appreciated. Thnx. James
