[
https://issues.apache.org/jira/browse/ARROW-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated ARROW-3915:
--------------------------------
Labels: parquet (was: )
> [Python] Support partition columns when incrementally writing
> -------------------------------------------------------------
>
> Key: ARROW-3915
> URL: https://issues.apache.org/jira/browse/ARROW-3915
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Affects Versions: 0.11.1
> Reporter: Mackenzie
> Priority: Major
> Labels: parquet
> Fix For: 0.13.0
>
>
> Currently `partition_cols` support in pyarrow is implemented in:
> [https://github.com/apache/arrow/blob/69d207ff446c76f78fe27b960e7ebe89a607d992/python/pyarrow/parquet.py#L1205-L1235.]
> However, there is no way to easily do column partitioning when writing
> datasets incrementally via `ParquetWriter`. It would be very helpful if the
> column partitioning logic was made more modular and re-used in
> `ParquetWriter`.
> One option would be to support the `partition_cols` keyword argument in
> `ParquetWriter.write_table`. However, this would introduce the potential to
> have inconsistent partition columns in subsequent files. Perhaps the better
> approach would be to pass as a kwarg when constructing `ParquetWriter` and
> manage it as a property whose setter would throw an error if attempting to
> set while the writer is open.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)