[
https://issues.apache.org/jira/browse/ARROW-18171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627736#comment-17627736
]
Miles Granger commented on ARROW-18171:
---------------------------------------
[Relevant SO discussion |
https://stackoverflow.com/questions/47113813/using-pyarrow-how-do-you-append-to-parquet-file]
> Feature to append row groups to existing parquet file
> -----------------------------------------------------
>
> Key: ARROW-18171
> URL: https://issues.apache.org/jira/browse/ARROW-18171
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Parquet, Python
> Reporter: Nischith
> Priority: Minor
>
> This is related to pyarrow.
> Right now, it's possible to append row groups to parquet file as long as the
> writer is open. Once the writer is closed, it's not possible to append new
> row group to a parquet file.
> the only option in such situation is to either recreate the file or write
> multiple files to the dataset.
>
> This is possible with fastparquet using _append=True_ parameter. - [API —
> fastparquet 0.7.1 documentation
> |https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write]
> Feature to append row groups to existing file can be beneficial in pyarrow as
> well.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)