westonpace opened a new pull request #12150:
URL: https://github.com/apache/arrow/pull/12150
* The benchmark named ReadFile is misleading since it is actually reading
from an in-memory buffer and no OS "read" call is ever issued.
* Renamed ReadTempFile to ReadCachedFile and added a second case for
ReadUncachedFile. The former reads a file in the OS' page cache and the latter
forces a read to actually hit the disk.
* The TempFile benchmarks were not actually writing the correct amount of
data and were reporting unrealistically high rates as a result.
* Adding a "partial read" parameter which, when true, only reads 1/8 the
columns in the file so we can see the impact of pushdown projection.
* Slightly reduced the range of parameters to keep the benchmark time
reasonable (8k columns wasn't telling us anything more than 4k columns).
NOTE: This PR will invalidate some previous results from
arrow-ipc-read-write-benchmark, disrupting conbench & other monitoring efforts.
This is because those previous results were wrong.
It also likely invalidates even more arrow-ipc-read-write-benchmark results
because we added a new parameter and renamed some of the benchmarks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]