Mackenzie created ARROW-3915:
--------------------------------

             Summary: [Python] Support partition columns when incrementally 
writing
                 Key: ARROW-3915
                 URL: https://issues.apache.org/jira/browse/ARROW-3915
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
    Affects Versions: 0.11.1
            Reporter: Mackenzie


Currently `partition_cols` support in pyarrow is implemented in: 
[https://github.com/apache/arrow/blob/69d207ff446c76f78fe27b960e7ebe89a607d992/python/pyarrow/parquet.py#L1205-L1235.]

However, there is no way to easily do column partitioning when writing datasets 
incrementally via `ParquetWriter`. It would be very helpful if the column 
partitioning logic was made more modular and re-used in `ParquetWriter`.

One option would be to support the `partition_cols` keyword argument in 
`ParquetWriter.write_table`. However, this would introduce the potential to 
have inconsistent partition columns in subsequent files. Perhaps the better 
approach would be to pass as a kwarg when constructing `ParquetWriter` and 
manage it as a property whose setter would throw an error if attempting to set 
while the writer is open.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to