[GitHub] [arrow] drin commented on issue #13981: [C++][Compute]Performance of arrow::compute compared to raw operations on `arrow::Array`

GitBox Tue, 30 Aug 2022 11:57:03 -0700


drin commented on issue #13981:
URL: https://github.com/apache/arrow/issues/13981#issuecomment-1232043790


   I modified my version of `ArrayGreaterThan` (which is very very similar to 
your version of `GreaterEqual`) to create a `FastArrayGreaterThan` benchmark (I 
just realized that they should all be "GreaterEqual" instead of "GreaterThan", 
sorry).
   
   In this version of the benchmark 
(https://gist.github.com/drin/8dfa8ee631ef17b63dca5c2348f20d3c#file-fast_compute_greater_equal_benchmark-cc),
 I looked at the implementation of the "greater_equal" function, and I compute 
the values in batches. It's just an approximation due to effort, but as you can 
see from the benchmark results the time gets much closer to 
`ComputeGreaterThan`. I did this to validate that I should be seeing such a 
fast time compared to the raw versions, and I think this validates it.
   
   For reference, this is the implementation I was referencing to understand 
how "greater_equal" is implemented:
   * where the function is constructed: 
[scalar_compare.cc#L894](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L894)
   * where the kernel is registered for numeric inputs: 
[scalar_compare.cc#L396](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L396)
   * where the kernel is being constructed (for <array, scalar> inputs), I 
think: 
[scalar_compare.cc#L322](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L322)
   * The actual kernel implementation: 
[scalar_compare.cc#L190](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L190)
   
   I would be interested to see if this type of implementation is faster for 
you or not. I am vaguely wondering if there's some SIMD things happening, as my 
SIMD level is configured as follows (according to the CMake output when I first 
build):
   ```bash
   --   ARROW_SIMD_LEVEL=NEON [default=NONE|SSE4_2|AVX2|AVX512|NEON|DEFAULT]
   --       Compile-time SIMD optimization level
   --   ARROW_RUNTIME_SIMD_LEVEL=MAX [default=NONE|SSE4_2|AVX2|AVX512|MAX]
   --       Max runtime SIMD optimization level
   ```
   
   Let me know if this is helpful for you at all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] drin commented on issue #13981: [C++][Compute]Performance of arrow::compute compared to raw operations on `arrow::Array`

Reply via email to