Re: [PR] POC: Vectorized hashtable for aggregation [datafusion]

via GitHub Sun, 20 Oct 2024 06:53:55 -0700


Rachelint commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1807827680



##########
datafusion/physical-plan/src/aggregates/group_values/column.rs:
##########
@@ -160,11 +162,112 @@ macro_rules! instantiate_primitive {
     };
 }
 
-fn append_col_value<C>(mut core: C, array: &ArrayRef, row: usize)
-where
-    C: FnMut(&ArrayRef, usize),
-{
-    core(array, row);
+struct AggregationHashTable<T: AggregationHashTableEntry> {
+    /// Raw table storing values in a `Vec`
+    raw_table: Vec<T>,

Review Comment:
   > Based on some experiments in changing hash join algorithm, I think it's 
likely `hashbrown` performs much better than implementing a hashtable ourselves 
although I would like to be surprised 🙂
   
   🤔 Even if we can perform something like `vectorized compare` or `vectorized 
append` in our hashtable?
   
   I found in `multi group by` case, we will perform the `compare` for each row 
leading to the array downcasting again and again... And actually the `downcast` 
operation will be compiled to many asm codes....
   
   And I foudn we can't eliminate it and perform the `vectorized compare` with 
`hashbrown`...
   
   ```
       fn equal_to_inner(&self, lhs_row: usize, array: &ArrayRef, rhs_row: 
usize) -> bool {
           let array = array.as_byte_view::<B>();
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] POC: Vectorized hashtable for aggregation [datafusion]

Reply via email to