jorgecarleitao commented on a change in pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r646083187



##########
File path: datafusion/src/physical_plan/hash_aggregate.rs
##########
@@ -339,6 +325,36 @@ pin_project! {
     }
 }
 
+fn hash_(group_values: &[ArrayRef]) -> Result<MutableBuffer<u64>> {
+    // compute the hashes
+    // todo: we should be able to use `MutableBuffer<u64>` to compute the hash 
and ^ them without
+    // allocating all the hashes before ^ them
+    let hashes = group_values
+        .iter()
+        .map(|x| {
+            let a = match x.data_type() {
+                DataType::Dictionary(_, d) => {
+                    // todo: think about how to perform this more efficiently
+                    // * first hash, then unpack
+                    // * do not unpack at all, and instead figure out a way to 
leverage dictionary-encoded.
+                    let unpacked = arrow2::compute::cast::cast(x.as_ref(), d)?;
+                    arrow2::compute::hash::hash(unpacked.as_ref())
+                }
+                _ => arrow2::compute::hash::hash(x.as_ref()),
+            };
+            Ok(a?)
+        })
+        .collect::<Result<Vec<_>>>()?;
+    let hash = MutableBuffer::<u64>::from(hashes[0].values());
+
+    Ok(hashes.iter().skip(1).fold(hash, |mut acc, x| {
+        acc.iter_mut()
+            .zip(x.values().iter())
+            .for_each(|(hash, other)| *hash ^= other);

Review comment:
       Thanks @Dandandan , this is now migrated; it uses the same that 
`hash_join` does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to