jayzhan211 commented on code in PR #8552:
URL: https://github.com/apache/arrow-datafusion/pull/8552#discussion_r1430762650


##########
datafusion/common/src/hash_utils.rs:
##########
@@ -207,6 +208,32 @@ fn hash_dictionary<K: ArrowDictionaryKeyType>(
     Ok(())
 }
 
+fn hash_struct_array(
+    array: &StructArray,
+    random_state: &RandomState,
+    hashes_buffer: &mut [u64],
+) -> Result<()> {
+    let nulls = array.nulls();
+    let num_columns = array.num_columns();
+
+    // Skip null columns
+    let valid_indices: Vec<usize> = if let Some(nulls) = nulls {
+        nulls.valid_indices().collect()
+    } else {
+        (0..num_columns).collect()
+    };
+
+    let mut values_hashes = vec![0u64; array.len()];
+    create_hashes(array.columns(), random_state, &mut values_hashes)?;

Review Comment:
   I dont think so.
   
   values_hashes: [9258723240401068087, 9258723240401068087, 
8502738074356456021, 8502738074356456021, 4222447303697976283, 
9753707356376286577]
   hashes_buffer: [9258723240401091360, 9258723240401091360, 0, 
8502738074356479294, 0, 0]
   
   We can see that `create_hashes` does not consider if the row is valid or 
not. Therefore, we need to iterate `valid_indices` once .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to