jianxind commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-641109277
Just find a document(https://dl.acm.org/doi/pdf/10.1145/3178433.3178435 PAGE 7, Table 6: Comparison of various SIMD wrappers.) Some SIMD helper(simdpp/xsimd) has performance issue at least on some workload. Another thing is most SIMD helpers has no runtime support, it means we still has to build same code(if we can find a common code path for one function) many times on arrow itself for the runtime capacity. And I'm working on the sparse part for aggregate sum recently, the data flow is total different for AVX2/AVX512. AVX512 has _mm512_mask_add_pd (__m512d src, __mmask8 k, __m512d a, __m512d b) support that it can SIMD add the results directly on the valid bit map. For AVX2, it has to use a lookup table mask with SIMD and operation to zero the invalid values before passing to SIMD add. The difference is applied to other future SIMD func also as all arrow data represented with valid bit map. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org