felipecrv commented on PR #39817: URL: https://github.com/apache/arrow/pull/39817#issuecomment-1912925883
Benchmarks of `sort` and `rank` on chunked arrays -- heavy users of `ChunkResolver`. 3 measurements after roughly every change to give an idea of level of noise. The purple group (`bounds-check-fix`) is when I fixed the out-of-bounds access bug that exists on `main` (not introduced by me in the optimizations). After that, the other two groups bring improvements that bring the throughput back to what was achieved before the bounds check.  Ideas that were tried and didn't make a difference or made throughput worse: - Removing the use of `std::atomic` completely, relaxed atomic operations are enough (which is good because that could introduce bugs) - Starting the `Bisect` on different ranges depending on the results of the branches [1] `ninja arrow-compute-vector-sort-benchmark && ./**/arrow-compute-vector-sort-benchmark --benchmark_filter="ChunkedArray(Sort|Rank).*Int64.*65536/100(/tiebreaker:2|$)" --benchmark_out_format=csv` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
