josiahyan opened a new pull request #8214: URL: https://github.com/apache/arrow/pull/8214
It turns out that setSafe performs a very expensive integer division when trying to compute buffer capacity; specifically, it divides by the field size, which isn't hardcoded. Although it is typically a power of 2, this doesn't compile down to a bitshift. Special-casing and forcing a bitshift operation results in a ~300% increase in benchmarks that use a hot loop to set Arrow vectors. We have a similar use-case in an internal data-intensive service. Benchmark results with arrow.enable_unsafe_memory_access=true Before: ``` Benchmark Mode Cnt Score Error Units IntBenchmarks.setIntDirectly avgt 15 9.563 ± 0.335 us/op IntBenchmarks.setWithValueHolder avgt 15 9.266 ± 0.064 us/op IntBenchmarks.setWithWriter avgt 15 18.806 ± 0.154 us/op ``` After: ``` Benchmark Mode Cnt Score Error Units IntBenchmarks.setIntDirectly avgt 15 3.490 ± 0.175 us/op IntBenchmarks.setWithValueHolder avgt 15 3.806 ± 0.015 us/op IntBenchmarks.setWithWriter avgt 15 5.490 ± 0.304 us/op ``` See https://issues.apache.org/jira/browse/ARROW-9965 for further benchmarks, and an analysis of the root cause of the slowdown. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
