Wes McKinney created PARQUET-1698:
-------------------------------------
Summary: [C++] Add reader option to pre-buffer entire serialized
row group into memory
Key: PARQUET-1698
URL: https://issues.apache.org/jira/browse/PARQUET-1698
Project: Parquet
Issue Type: Improvement
Components: parquet-cpp
Reporter: Wes McKinney
Fix For: cpp-1.6.0
In some scenarios (example: reading datasets from Amazon S3), reading columns
independently and allowing unbridled {{Read}} calls to the underlying file
handle can yield suboptimal performance. In such cases, it may be preferable to
first read the entire serialized row group into memory then deserialize the
constituent columns from this
Note that such an option would not be appropriate as a default behavior for all
file handle types since low-selectivity reads (example: reading only 3 columns
out of a file with 100 columns) will be suboptimal in some cases. I think it
would be better for "high latency" file systems to opt into this option
cc [~fsaintjacques] [~bkietz] [~apitrou]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)