[ 
https://issues.apache.org/jira/browse/PARQUET-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901509#comment-16901509
 ] 

David Lee commented on PARQUET-1626:
------------------------------------

I'm appending RowGroups using pyarrow today.

Open a new parquet file

For each row group in File 1:

    read row group. write row group to new file

For each row group in File 2:

    read row group. write row group to new file

Close new parquet file

No need to mess with metadata since all those stats are saved at a row group 
level.

I usually generate parquet files which are 30 to 40 megs each and I merge them 
afterwards to match the HDFS blocksize.

> [C++] Ability to concat parquet files 
> --------------------------------------
>
>                 Key: PARQUET-1626
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1626
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-cpp
>    Affects Versions: cpp-1.3.1
>            Reporter: nileema shingte
>            Priority: Major
>              Labels: features
>
> Ability to concat the parquet files is something we've wanted for some time 
> too. When we generate parquet files partitioned by an expression, we often 
> end up with tiny files and would like to add a post-processing step to concat 
> these files together.
> Is there a plan to add this ability to the library any time soon? 
> If not, it would be great if someone can provide a somewhat detailed 
> pseudocode (expanding on what [~xhochy] mentioned in the comment in 
> PARQUET-1022) as a guideline for conditions/scenarios that need to be handled 
> with extra care, so we can contribute this as a PR. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to