Dandandan commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1818154221
##########
datafusion/physical-plan/src/aggregates/group_values/column.rs:
##########
@@ -125,6 +233,292 @@ impl GroupValuesColumn {
| DataType::BinaryView
)
}
+
+ /// Collect vectorized context by checking hash values of `cols` in `map`
+ ///
+ /// 1. If bucket not found
+ /// - Build and insert the `new inlined group index view`
+ /// and its hash value to `map`
+ /// - Add row index to `vectorized_append_row_indices`
+ /// - Set group index to row in `groups`
+ ///
+ /// 2. bucket found
+ /// - Add row index to `vectorized_equal_to_row_indices`
+ /// - Check if the `group index view` is `inlined` or `non_inlined`:
+ /// If it is inlined, add to `vectorized_equal_to_group_indices`
directly.
+ /// Otherwise get all group indices from `group_index_lists`, and add
them.
+ ///
+ fn collect_vectorized_process_context(
+ &mut self,
+ batch_hashes: &[u64],
+ groups: &mut Vec<usize>,
+ ) {
+ self.vectorized_append_row_indices.clear();
+ self.vectorized_equal_to_row_indices.clear();
+ self.vectorized_equal_to_group_indices.clear();
+
+ let mut group_values_len = self.group_values[0].len();
+ for (row, &target_hash) in batch_hashes.iter().enumerate() {
+ let entry = self.map.get(target_hash, |(exist_hash, _)| {
Review Comment:
It should be slightly faster to not do both `get` and `insert`, you could
use `find_or_find_insert_slot` and `insert_in_slot` if this is expensive.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]