rluvaton opened a new pull request, #17977: URL: https://github.com/apache/datafusion/pull/17977
## Which issue does this PR close? N/A ## Rationale for this change Making multi column aggregation even faster ## What changes are included in this PR? In `PrimitiveGroupValueBuilder.vectorized_equal_to` always evaluate and use unchecked as both of these changes are what making the code compile to SIMD. ## Are these changes tested? Existing tests ## Are there any user-facing changes? Nope ----- I tried a LOT of variations [GodBolt](https://godbolt.org/z/Kc8ze6E9n) from splitting to fixed size chunks and trying to get auto-vectorization to use gather and creating bitmask to even testing portable SIMD (just to see what it will generate). this version only optimize the non null path for the moment as it is the easiest. once and if we change from `&mut [bool]` to mutable packed bits we could: 1. evaluate in chunks of `64` items (I tried different variations to see what is the best - you can tweak in the godbolt above with different type and size to check for yourself), 64 is not necessarily the best but it will be the fastest I think for doing AND with the `equal_to_results` boolean buffer 2. add optimization for nullable as well by just doing bitwise operation at 64 items at a time and avoid the cost of getting each bit manually 3. skip 64 items right away if the the `equal_to_results` equal to `0x00` (i.e. all false) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
