This might seem like a dumb question but I am not intimate with the API yet
to figure out how to get around this problem.

I have a pre-defined Arrow Schema which I convert to Parquet Schema using
the "ToParquetSchema" function.  This returns a SchemaDescriptor object.
https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/schema.h#L80

ParquetFileWriter on the other hand, expects a shared_ptr<GroupNode>
https://github.com/apache/parquet-cpp/blob/master/src/parquet/file/writer.h#L126

SchemaDescriptor can return a raw pointer for GroupNode but to pass it to
the ParquetFileWriter, I need a shared_ptr.   This introduces memory
management complications.  I'd rather not create a copy of the GroupNode in
order to pass it to ParquetFileWriter.

 * // convert arrow schema to parquet schema*
*  std::shared_ptr<SchemaDescriptor> parquet_schema;*
*  std::shared_ptr<::parquet::WriterProperties> properties =*
*    ::parquet::default_writer_properties();*
*  ToParquetSchema(arrow_sch.get(), *properties.get(), &parquet_schema);*

*  // write arrow table to parquet*
*  parquet::schema::GroupNode* g =
(parquet::schema::GroupNode*)parquet_schema->group_node();*
*  grp_node.reset(g);  // Dont want to do this !*
*  std::shared_ptr<::arrow::io::FileOutputStream> sink;*
*  ::arrow::io::FileOutputStream::Open(path, &sink);*
*  std::unique_ptr<FileWriter> arrow_writer(*
*    new FileWriter(pool, ParquetFileWriter::Open(sink, grp_node)));*

*  arrow_writer->WriteTable(*new_table_ptr.get(), 65536);*

Is this an API limitation that no one has hit before ? Or I am missing a
better way of writing parquet files given a pre-defined arrow schema.

-Sandeep

Reply via email to