Dandandan commented on code in PR #16136:
URL: https://github.com/apache/datafusion/pull/16136#discussion_r2100719962


##########
datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs:
##########
@@ -74,21 +77,21 @@ macro_rules! hash_float {
 
 hash_float!(f16, f32, f64);
 
-/// A [`GroupValues`] storing a single column of primitive values
+/// A [`GroupValues`] storing a single column of normal primitive values (bits 
<= 64)
 ///
 /// This specialization is significantly faster than using the more general
 /// purpose `Row`s format
 pub struct GroupValuesPrimitive<T: ArrowPrimitiveType> {
     /// The data type of the output array
     data_type: DataType,
-    /// Stores the `(group_index, hash)` based on the hash of its value
+    /// Stores the `(group_index, group_value)`
     ///
-    /// We also store `hash` is for reducing cost of rehashing. Such cost
-    /// is obvious in high cardinality group by situation.
+    /// We directly store copy of `group_value` for not only efficient
+    /// rehashing, but also efficient probing.
     /// More details can see:
-    /// <https://github.com/apache/datafusion/issues/15961>
+    /// <https://github.com/apache/datafusion/issues/16136>
     ///
-    map: HashTable<(usize, u64)>,
+    map: HashTable<(usize, T::Native)>,

Review Comment:
   Is storing `values: Vec<T::Native>` in this case necessary? 
   It could be rebuild in `emit_internal` by traversing the map items I think 
(create a `Vec` and update by group index). This avoids doing that while 
inserting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to