zanmato1984 commented on issue #43687:
URL: https://github.com/apache/arrow/issues/43687#issuecomment-2290534679

   > > However I would try to answer myself: for other types, the compiler 
generates different SIMD code for AVX512 than for AVX2. So for these kernels, 
we have to mark them AVX512-only because an AVX2-only architecture wouldn't 
know them. For string-like and fixed-size-binary types, on the other hand, we 
are sure that the SIMD code generated by the compiler for both AVX512 and AVX2 
are the same (all AVX2-capable?)? So these kernels are actually AVX2-capable, 
hence we specify a more relaxing SIMD level (AVX2) for them?
   > 
   > @zanmato1984 yes. I mean, you have to check the code carefully, but that 
is the intention: don't instantiate the AVX512 template if the template never 
needs AVX512 instructions and AVX2 is enough.
   > 
   > `SumArray` is instantiated with AVX512 unnecessarily because the 
implementation of SumArray for most types doesn't, in fact, use AVX512 
instructions. It's very hard to structure this code correctly. I have never 
written SIMD kernels in Arrow. All I learned was from trying to answer support 
questions like the ones you're asking now.
   
   I appreciate the explanation. I now have a further question: so the minmax 
implementations for both AVX2 and AVX512 are the same (both being AVX2-level 
optimized). Dose this mean that there are two identical kernels with identical 
SIMD level in the scalar agg function minmax? (Suppose both compiler switches 
`ARROW_HAVE_RUNTIME_AVX2/512` are on - which should normally be the case.)
   
   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to