Wes McKinney created ARROW-8599:
-----------------------------------
Summary: [C++][Parquet] Optional parallel processing when writing
Parquet files
Key: ARROW-8599
URL: https://issues.apache.org/jira/browse/ARROW-8599
Project: Apache Arrow
Issue Type: Improvement
Components: C++
Reporter: Wes McKinney
Fix For: 1.0.0
If we permit encoded columns in row groups to be buffered in memory rather than
immediately written out to the {{OutputStream}}, then we can use multiple
threads for the encoding / compression of columns. Combined with a separate
thread to take the encoded columns and write them out to disk, this should
yield substantially improved file write times.
This could be enabled through an option since it would increase memory use when
writing. The memory use can be somewhat constrained by limiting the size of row
groups
--
This message was sent by Atlassian Jira
(v8.3.4#803005)