David Li created ARROW-16944:
--------------------------------
Summary: [C++] Create macro-benchmarks of file format readers
Key: ARROW-16944
URL: https://issues.apache.org/jira/browse/ARROW-16944
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: David Li
Currently we have (some) microbenchmarks, but measuring performance of our
various readers (CSV, JSON, IPC, Parquet, ORC) over "real world" files would
also be interesting and hopefully more illustrative of the use cases we
actually care about. Such benchmarks may be expensive, though.
Ideally, we would do this in a variety of scenarios: in-memory (to focus on CPU
optimization), on-disk (though such measurements would likely be extremely
noisy?), and over the network (perhaps with something like Minio + Toxiproxy to
try to have a consistent, reproducible setup) so that we can also judge the I/O
characteristics of the readers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)