Dandandan edited a comment on pull request #10002: URL: https://github.com/apache/arrow/pull/10002#issuecomment-819747206
> I think this is a great idea @Dandandan > > I ran the benchmark tests on my laptop and I saw similar improvement numbers (master is 30% slower than this branch). > > I vote we :shipit: > > Given how effective this is, perhaps we can add a similar thing to other kernels (I could file a ticket if so) > > FYI @andygrove and @nevi-me I think it can be useful to check which kernels / functions can benefit from some more simd instructions. We might also need to play a bit with things like `inline` attributes, as non-inlined code might not benefit from the same optimizations, as the function in that case is reused for both code paths. Also auto vectorization doesn't always work and as we see here, it is not nearly as effective as thr "manual" implementation (still ~4x faster!). As we are using packed_simd2, which also uses only the standard instructions set in `target-features` (i.e. sse2), I believe the same idea could be used there without resorting to emitting instructions unconditionally (like we do for the `avx_512` feature at the moment). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
