[
https://issues.apache.org/jira/browse/ARROW-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17394120#comment-17394120
]
Antoine Pitrou commented on ARROW-6579:
---------------------------------------
cc [~jorisvandenbossche]
> [Python] Parallel pyarrow.parquet.write_to_dataset
> --------------------------------------------------
>
> Key: ARROW-6579
> URL: https://issues.apache.org/jira/browse/ARROW-6579
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 0.14.1
> Reporter: Adam Lippai
> Priority: Major
> Labels: dataset, dataset-parquet-write, parquet
>
> pyarrow.parquet.write_to_dataset() is single-threaded now and converts the
> table from/to Pandas. We should lower the dataset writing to C++ (dropping
> Pandas usage) so it's easier to write the partitioned dataset using multiple
> threads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)