Jefffrey commented on code in PR #19500:
URL: https://github.com/apache/datafusion/pull/19500#discussion_r2707508060


##########
datafusion/common/src/hash_utils.rs:
##########
@@ -513,24 +514,41 @@ fn hash_list_array<OffsetSize>(
 where
     OffsetSize: OffsetSizeTrait,
 {
-    let values = array.values();
-    let offsets = array.value_offsets();
-    let nulls = array.nulls();
-    let mut values_hashes = vec![0u64; values.len()];
-    create_hashes([values], random_state, &mut values_hashes)?;
-    if let Some(nulls) = nulls {
-        for (i, (start, stop)) in 
offsets.iter().zip(offsets.iter().skip(1)).enumerate() {
-            if nulls.is_valid(i) {
+    // In case values is sliced, hash only the bytes used by the offsets of 
this ListArray
+    let first_offset = 
array.value_offsets().first().cloned().unwrap_or_default();
+    let last_offset = 
array.value_offsets().last().cloned().unwrap_or_default();
+    let value_bytes_len = (last_offset - first_offset).as_usize();
+    let mut values_hashes = vec![0u64; value_bytes_len];

Review Comment:
   Do you mean reuse a buffer between invocations of `hash_list_array()` 
itself? We could look into that, but I'd say thats beyond the scope of changes 
here especially as the other functions don't do this, so might need more 
plumbing etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to