Rajeshwar Agrawal created ARROW-3949:
----------------------------------------
Summary: parquet cpp - improve examples
Key: ARROW-3949
URL: https://issues.apache.org/jira/browse/ARROW-3949
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Rajeshwar Agrawal
It would be a great help to have examples of using parquet arrow high-level API
for the following 2 cases
* Storing nested data types (storing nested data types is touted as major
merit of parquet, so I think this case should be included as an example).
Ideally, an example of how to use StructArray nested with several primities
types, list types and other struct type would cover every case of nested
hierarchy of complex data representations
* Buffered or Batched writes to parquet file. Parquet is meant to be used for
large amounts of data. The current examples store all of the data as in arrow
data structures, before writing to parquet file. Would be great to include an
example of batched writes, which is helpful in most use cases of parquet. The
current example creates a {{arrow::Table}}, which needs to be filled with
{{arrow::Array}}(s) of entire data. Ideally, an example which generates some
data in several {{arrow::Array}}(s), and then stores (appends) them as a new
Row Group (or Column Chunk) in an existing (new) parquet file (writer), using
{{NewRowGroup}} and {{WriteColumnChunk}} functions, thus demonstrating a lower
memory footprint for writing a parquet file
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)