alamb opened a new issue, #18676: URL: https://github.com/apache/datafusion/issues/18676
### Is your feature request related to a problem or challenge? @rluvaton notes on https://github.com/apache/datafusion/pull/17977: > once and if we change from &mut [bool] to mutable packed bits we could: > > 1. evaluate in chunks of 64 items (I tried different variations to see what is the best - you can tweak in the godbolt above with different type and size to check for yourself), 64 is not necessarily the best but it will be the fastest I think for doing AND with the equal_to_results boolean buffer > 2. add optimization for nullable as well by just doing bitwise operation at 64 items at a time and avoid the cost of getting each bit manually > 3. skip 64 items right away if the the equal_to_results equal to 0x00 (i.e. all false) I believe he is referring to this code: https://github.com/apache/datafusion/blob/73038f50dd7e1086a36589b4a5ee1e8db18b96f3/datafusion/physical-plan/src/aggregates/group_values/multi_group_by/mod.rs#L71-L86 ### Describe the solution you'd like So basically rather than passing in `&[mut bool]` it would take a `BooleanBufferBuilder` or something equivalent. ```rust fn vectorized_equal_to( &self, lhs_rows: &[usize], array: &ArrayRef, rhs_rows: &[usize], equal_to_results: &BooleanBufferBuilder, // <--- Pass in some sort of bitmask representation rather than Vec<bool> ); ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
