This might seem like a dumb question but I am not intimate with the API yet to figure out how to get around this problem.
I have a pre-defined Arrow Schema which I convert to Parquet Schema using the "ToParquetSchema" function. This returns a SchemaDescriptor object. https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/schema.h#L80 ParquetFileWriter on the other hand, expects a shared_ptr<GroupNode> https://github.com/apache/parquet-cpp/blob/master/src/parquet/file/writer.h#L126 SchemaDescriptor can return a raw pointer for GroupNode but to pass it to the ParquetFileWriter, I need a shared_ptr. This introduces memory management complications. I'd rather not create a copy of the GroupNode in order to pass it to ParquetFileWriter. * // convert arrow schema to parquet schema* * std::shared_ptr<SchemaDescriptor> parquet_schema;* * std::shared_ptr<::parquet::WriterProperties> properties =* * ::parquet::default_writer_properties();* * ToParquetSchema(arrow_sch.get(), *properties.get(), &parquet_schema);* * // write arrow table to parquet* * parquet::schema::GroupNode* g = (parquet::schema::GroupNode*)parquet_schema->group_node();* * grp_node.reset(g); // Dont want to do this !* * std::shared_ptr<::arrow::io::FileOutputStream> sink;* * ::arrow::io::FileOutputStream::Open(path, &sink);* * std::unique_ptr<FileWriter> arrow_writer(* * new FileWriter(pool, ParquetFileWriter::Open(sink, grp_node)));* * arrow_writer->WriteTable(*new_table_ptr.get(), 65536);* Is this an API limitation that no one has hit before ? Or I am missing a better way of writing parquet files given a pre-defined arrow schema. -Sandeep
