zanmato1984 commented on issue #43687: URL: https://github.com/apache/arrow/issues/43687#issuecomment-2290534679
> > However I would try to answer myself: for other types, the compiler generates different SIMD code for AVX512 than for AVX2. So for these kernels, we have to mark them AVX512-only because an AVX2-only architecture wouldn't know them. For string-like and fixed-size-binary types, on the other hand, we are sure that the SIMD code generated by the compiler for both AVX512 and AVX2 are the same (all AVX2-capable?)? So these kernels are actually AVX2-capable, hence we specify a more relaxing SIMD level (AVX2) for them? > > @zanmato1984 yes. I mean, you have to check the code carefully, but that is the intention: don't instantiate the AVX512 template if the template never needs AVX512 instructions and AVX2 is enough. > > `SumArray` is instantiated with AVX512 unnecessarily because the implementation of SumArray for most types doesn't, in fact, use AVX512 instructions. It's very hard to structure this code correctly. I have never written SIMD kernels in Arrow. All I learned was from trying to answer support questions like the ones you're asking now. I appreciate the explanation. I now have a further question: so the minmax implementations for both AVX2 and AVX512 are the same (both being AVX2-level optimized). Dose this mean that there are two identical kernels with identical SIMD level in the scalar agg function minmax? (Suppose both compiler switches `ARROW_HAVE_RUNTIME_AVX2/512` are on - which should normally be the case.) Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
