felipecrv commented on issue #43687:
URL: https://github.com/apache/arrow/issues/43687#issuecomment-2289621635

   > However I would try to answer myself: for other types, the compiler 
generates different SIMD code for AVX512 than for AVX2. So for these kernels, 
we have to mark them AVX512-only because an AVX2-only architecture wouldn't 
know them. For string-like and fixed-size-binary types, on the other hand, we 
are sure that the SIMD code generated by the compiler for both AVX512 and AVX2 
are the same (all AVX2-capable?)? So these kernels are actually AVX2-capable, 
hence we specify a more relaxing SIMD level (AVX2) for them?
   
   @zanmato1984 yes. I mean, you have to check the code carefully, but that is 
the intention: don't instantiate the AVX512 template if the template never 
needs AVX512 instructions and AVX2 is enough.
   
   `SumArray` is instantiated with AVX512 unnecessarily because the 
implementation of SumArray for most types doesn't, in fact, used AVX512 
instructions. It's very hard to structure this code correctly. I have never 
written SIMD kernels in Arrow. All I learned was from trying to answer support 
questions like the ones you're asking now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to