Safyre Anderson created ARROW-1400:
--------------------------------------

             Summary: Ability to create partitions when writing to Parquet
                 Key: ARROW-1400
                 URL: https://issues.apache.org/jira/browse/ARROW-1400
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
    Affects Versions: 0.6.0
         Environment: Mac OS Sierra 10.12.6
            Reporter: Safyre Anderson
            Priority: Minor


I'm fairly new to pyarrow so I apologize if this is already a feature, but I 
couldn't find a solution in the documentation nor an existing issue.  Basically 
I'm trying to export pandas dataframes to .parquet files with partitions. I can 
see that pyarrow.parquet has a way of reading .parquet files with partitions, 
but there's no indication that it can write with partitions. E.g., it would be 
nice if there was a parameter in pyarrow.Table.write_table() that took a list 
of columns to partition the table similar to the pyspark implementation: 
spark.write.parquet's "partitionBy" parameter.

Referenced links:
https://arrow.apache.org/docs/python/parquet.html
https://arrow.apache.org/docs/python/parquet.html?highlight=pyarrow%20parquet%20partition



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to