I think you want to use parquet::arrow::FileWriter::Open

https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.h#L112

The implementation is here:

https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/writer.cc#L992

- Wes

On Sun, Nov 26, 2017 at 8:25 AM, Sandeep Joshi <[email protected]> wrote:
> This might seem like a dumb question but I am not intimate with the API yet
> to figure out how to get around this problem.
>
> I have a pre-defined Arrow Schema which I convert to Parquet Schema using
> the "ToParquetSchema" function.  This returns a SchemaDescriptor object.
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/arrow/schema.h#L80
>
> ParquetFileWriter on the other hand, expects a shared_ptr<GroupNode>
> https://github.com/apache/parquet-cpp/blob/master/src/parquet/file/writer.h#L126
>
> SchemaDescriptor can return a raw pointer for GroupNode but to pass it to
> the ParquetFileWriter, I need a shared_ptr.   This introduces memory
> management complications.  I'd rather not create a copy of the GroupNode in
> order to pass it to ParquetFileWriter.
>
>  * // convert arrow schema to parquet schema*
> *  std::shared_ptr<SchemaDescriptor> parquet_schema;*
> *  std::shared_ptr<::parquet::WriterProperties> properties =*
> *    ::parquet::default_writer_properties();*
> *  ToParquetSchema(arrow_sch.get(), *properties.get(), &parquet_schema);*
>
> *  // write arrow table to parquet*
> *  parquet::schema::GroupNode* g =
> (parquet::schema::GroupNode*)parquet_schema->group_node();*
> *  grp_node.reset(g);  // Dont want to do this !*
> *  std::shared_ptr<::arrow::io::FileOutputStream> sink;*
> *  ::arrow::io::FileOutputStream::Open(path, &sink);*
> *  std::unique_ptr<FileWriter> arrow_writer(*
> *    new FileWriter(pool, ParquetFileWriter::Open(sink, grp_node)));*
>
> *  arrow_writer->WriteTable(*new_table_ptr.get(), 65536);*
>
> Is this an API limitation that no one has hit before ? Or I am missing a
> better way of writing parquet files given a pre-defined arrow schema.
>
> -Sandeep

Reply via email to