[
https://issues.apache.org/jira/browse/ARROW-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16616336#comment-16616336
]
Wes McKinney commented on ARROW-3020:
-------------------------------------
Patches welcome
> [Python] Addition of option to allow empty Parquet row groups
> -------------------------------------------------------------
>
> Key: ARROW-3020
> URL: https://issues.apache.org/jira/browse/ARROW-3020
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++, Python
> Reporter: Alex Mendelson
> Priority: Major
> Labels: parquet
> Fix For: 0.12.0
>
>
> While our use case is not common, I was able to find one related request from
> roughly a year ago. Could this be added as a feature?
> https://issues.apache.org/jira/browse/PARQUET-1047
> *Motivation*
> We have an application where each row is associated with one of N contexts,
> though a minority of contexts may have no associated rows. When encountering
> the Nth context, we will wish to retrieve all the associated rows. Row groups
> would provide a natural way to index the data, as the nth context could
> naturally relate to the nth row group.
> Unfortunately, this is not possible at the present time, as pyarrow does not
> support writing empty row groups. If one writes a pyarrow.Table containing
> zero rows using pyarrow.parquet.ParquetWriter, it is omitted from the final
> file, and this distorts the indexing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)