[ 
https://issues.apache.org/jira/browse/PARQUET-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791492#comment-16791492
 ] 

Uwe L. Korn commented on PARQUET-1022:
--------------------------------------

There is no implementation of merging concatenating files in C++ yet but that 
would be much easier to implement than an Append mode. For merging files you 
would read the footer of all files, copy the binary content of the RowGroups 
into the new file and then compute a new footer from the information of 
existing footer. From what I can think of currently, this should not require 
decoding any RowGroups thus implementing an explicit merge will be a lot faster 
then materialising the data first and writing a fully new file.

> [C++] Append mode in parquet-cpp
> --------------------------------
>
>                 Key: PARQUET-1022
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1022
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-cpp
>    Affects Versions: cpp-1.1.0
>            Reporter: yugu
>            Assignee: Wes McKinney
>            Priority: Major
>
> As said, currently trying to work out a append feature for parquet files in 
> c++.
> (been searching through repo etc, can't find example tho..)
> Current solution is to (assume no schema changes that is):
> Read in metadata
> Change metadata based on appended rows+ original rows
> Append a new row group (or multiple row group writer)
> Write the new rows.
> ---
> The problem is that, is approached this way, the original last row group may 
> not be complete filled. Was wondering if there is a fix or I'm using the api 
> wrong...
> Thanks ! : D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to