jhorstmann opened a new pull request #1146: URL: https://github.com/apache/arrow-rs/pull/1146
# Which issue does this PR close? Implements comparison for simd types with less than 8 lanes. Closes #1136 . # What changes are included in this PR? This PR changes the comparison kernel so that the simd portion can always append 64 bits at a time. Since the simd types are 512 bits wide, this means the inner comparison is unrolled, for example 8 times for Float64 (8 x 8lanes) or 4 times for Float32 (4 x 16lanes). For Int8 types it does not get unrolled since one comparison already results in 64 bits. This should even speed up the comparison kernel a bit for common types, because there is less loop overhead. On my laptop the simd version for `i128` `MonthDayNano` types is not actually faster than the scalar version, on a more modern or server class machine there should be a slight speedup. Unrelated to this change I also noticed that the code generation for non-avx512 machines is sub-optimal since the compiler has to emulate the 512bit wider operations using smaller vector registers, and for the bitmap generating code this has some overhead. <!--- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> # Are there any user-facing changes? <!--- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!--- If there are any breaking changes to public APIs, please add the `breaking change` label. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
