drin commented on issue #13981: URL: https://github.com/apache/arrow/issues/13981#issuecomment-1232043790
I modified my version of `ArrayGreaterThan` (which is very very similar to your version of `GreaterEqual`) to create a `FastArrayGreaterThan` benchmark (I just realized that they should all be "GreaterEqual" instead of "GreaterThan", sorry). In this version of the benchmark (https://gist.github.com/drin/8dfa8ee631ef17b63dca5c2348f20d3c#file-fast_compute_greater_equal_benchmark-cc), I looked at the implementation of the "greater_equal" function, and I compute the values in batches. It's just an approximation due to effort, but as you can see from the benchmark results the time gets much closer to `ComputeGreaterThan`. I did this to validate that I should be seeing such a fast time compared to the raw versions, and I think this validates it. For reference, this is the implementation I was referencing to understand how "greater_equal" is implemented: * where the function is constructed: [scalar_compare.cc#L894](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L894) * where the kernel is registered for numeric inputs: [scalar_compare.cc#L396](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L396) * where the kernel is being constructed (for <array, scalar> inputs), I think: [scalar_compare.cc#L322](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L322) * The actual kernel implementation: [scalar_compare.cc#L190](https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_compare.cc#L190) I would be interested to see if this type of implementation is faster for you or not. I am vaguely wondering if there's some SIMD things happening, as my SIMD level is configured as follows (according to the CMake output when I first build): ```bash -- ARROW_SIMD_LEVEL=NEON [default=NONE|SSE4_2|AVX2|AVX512|NEON|DEFAULT] -- Compile-time SIMD optimization level -- ARROW_RUNTIME_SIMD_LEVEL=MAX [default=NONE|SSE4_2|AVX2|AVX512|MAX] -- Max runtime SIMD optimization level ``` Let me know if this is helpful for you at all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
