You can see some sample usages in the Cython wrappers of this code for Python:
https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pyx On Mon, Nov 27, 2017 at 6:58 AM, Sandeep Joshi <[email protected]> wrote: > thanks! Is there sample code on how to use these APIs to learn best > practices ? > > I am looking at > https://github.com/apache/arrow/tree/master/cpp/src/arrow/python > but that only covers Arrow itself > > -Sandeep > > On Sun, Nov 26, 2017 at 9:57 PM, Wes McKinney <[email protected]> wrote: > >> I think you want to use parquet::arrow::FileWriter::Open >> >> https://github.com/apache/parquet-cpp/blob/master/src/ >> parquet/arrow/writer.h#L112 >> >> The implementation is here: >> >> https://github.com/apache/parquet-cpp/blob/master/src/ >> parquet/arrow/writer.cc#L992 >> >> - Wes >> >> On Sun, Nov 26, 2017 at 8:25 AM, Sandeep Joshi <[email protected]> >> wrote: >> > This might seem like a dumb question but I am not intimate with the API >> yet >> > to figure out how to get around this problem. >> > >> > I have a pre-defined Arrow Schema which I convert to Parquet Schema using >> > the "ToParquetSchema" function. This returns a SchemaDescriptor object. >> > https://github.com/apache/parquet-cpp/blob/master/src/ >> parquet/arrow/schema.h#L80 >> > >> > ParquetFileWriter on the other hand, expects a shared_ptr<GroupNode> >> > https://github.com/apache/parquet-cpp/blob/master/src/ >> parquet/file/writer.h#L126 >> > >> > SchemaDescriptor can return a raw pointer for GroupNode but to pass it to >> > the ParquetFileWriter, I need a shared_ptr. This introduces memory >> > management complications. I'd rather not create a copy of the GroupNode >> in >> > order to pass it to ParquetFileWriter. >> > >> > * // convert arrow schema to parquet schema* >> > * std::shared_ptr<SchemaDescriptor> parquet_schema;* >> > * std::shared_ptr<::parquet::WriterProperties> properties =* >> > * ::parquet::default_writer_properties();* >> > * ToParquetSchema(arrow_sch.get(), *properties.get(), >> &parquet_schema);* >> > >> > * // write arrow table to parquet* >> > * parquet::schema::GroupNode* g = >> > (parquet::schema::GroupNode*)parquet_schema->group_node();* >> > * grp_node.reset(g); // Dont want to do this !* >> > * std::shared_ptr<::arrow::io::FileOutputStream> sink;* >> > * ::arrow::io::FileOutputStream::Open(path, &sink);* >> > * std::unique_ptr<FileWriter> arrow_writer(* >> > * new FileWriter(pool, ParquetFileWriter::Open(sink, grp_node)));* >> > >> > * arrow_writer->WriteTable(*new_table_ptr.get(), 65536);* >> > >> > Is this an API limitation that no one has hit before ? Or I am missing a >> > better way of writing parquet files given a pre-defined arrow schema. >> > >> > -Sandeep >>
