[ 
https://issues.apache.org/jira/browse/PARQUET-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney moved ARROW-3949 to PARQUET-1526:
----------------------------------------------

    Component/s:     (was: C++)
                 parquet-cpp
       Workflow: patch-available, re-open possible  (was: jira)
            Key: PARQUET-1526  (was: ARROW-3949)
        Project: Parquet  (was: Apache Arrow)

> [C++] parquet cpp - improve examples
> ------------------------------------
>
>                 Key: PARQUET-1526
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1526
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Rajeshwar Agrawal
>            Priority: Minor
>
> It would be a great to have examples of using parquet arrow high-level API 
> for the following 2 cases
>  * Storing nested data types (storing nested data types is touted as major 
> merit of parquet, so I think this case should be included as an example). 
> Ideally, an example of how to use {{arrow::StructArray}} nested with several 
> primities types, list types and other nested types would cover every case of 
> nested hierarchy of complex data representations
>  * Buffered or Batched writes to parquet file. Parquet is meant to be used 
> for large amounts of data. The current examples stores all of the data as in 
> arrow data structures, before writing to parquet file, which has a huge 
> memory footprint, proportional to the amount of data being stored. An example 
> of writing directly to row groups and columns, can nicely demonstrate how to 
> store data with smaller memory footprint. The current example creates a 
> {{arrow::Table}}, which needs to be filled with {{arrow::Array}}(s) of entire 
> data, size of which is bounded by the amount of RAM. Ideally, an example 
> which generates some data in several {{arrow::Array}}(s), and then stores 
> (appends) them as a new Row Group (or Column Chunk) in a 
> {{parquet::arrow::FileWriter}}, using {{NewRowGroup}} and 
> {{WriteColumnChunk}} functions, thus demonstrating a lower memory footprint 
> for writing a parquet file with huge amounts of data



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to