David Li created ARROW-16944:
--------------------------------

             Summary: [C++] Create macro-benchmarks of file format readers
                 Key: ARROW-16944
                 URL: https://issues.apache.org/jira/browse/ARROW-16944
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: David Li


Currently we have (some) microbenchmarks, but measuring performance of our 
various readers (CSV, JSON, IPC, Parquet, ORC) over "real world" files would 
also be interesting and hopefully more illustrative of the use cases we 
actually care about. Such benchmarks may be expensive, though.

Ideally, we would do this in a variety of scenarios: in-memory (to focus on CPU 
optimization), on-disk (though such measurements would likely be extremely 
noisy?), and over the network (perhaps with something like Minio + Toxiproxy to 
try to have a consistent, reproducible setup) so that we can also judge the I/O 
characteristics of the readers.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to