martin-g commented on code in PR #18981:
URL: https://github.com/apache/datafusion/pull/18981#discussion_r2571542572


##########
datafusion/common/src/cast.rs:
##########
@@ -334,3 +335,8 @@ pub fn as_list_view_array(array: &dyn Array) -> 
Result<&ListViewArray> {
 pub fn as_large_list_view_array(array: &dyn Array) -> 
Result<&LargeListViewArray> {
     Ok(downcast_value!(array, LargeListViewArray))
 }
+
+// Downcast ArrayRef to RunArray
+pub fn as_run_array<R: RunEndIndexType>(array: &dyn Array) -> 
Result<&RunArray<R>> {

Review Comment:
   Is this function still needed ?
   Its usage has been replaced with `downcast_run_array!()` and now it is not 
used anywhere.



##########
datafusion/common/src/hash_utils.rs:
##########
@@ -484,6 +485,40 @@ fn hash_fixed_list_array(
     Ok(())
 }
 
+#[cfg(not(feature = "force_hash_collisions"))]
+fn hash_run_array<R: RunEndIndexType>(
+    array: &RunArray<R>,
+    random_state: &RandomState,
+    hashes_buffer: &mut [u64],
+    rehash: bool,
+) -> Result<()> {
+    let values = array.values();
+    let values_len = values.len();
+    let mut values_hashes = vec![0u64; values_len];
+    create_hashes(&[Arc::clone(values)], random_state, &mut values_hashes)?;
+
+    let run_ends = array.run_ends();
+    let mut prev_run_end = 0;
+
+    for (i, value_hash) in values_hashes.iter().enumerate().take(values_len) {
+        let run_end = run_ends.values()[i].as_usize();
+
+        if rehash {

Review Comment:
   The other functions check whether the current value is valid/non-null before 
hashing it when `rehash == true`. Is this needed here too ?



##########
datafusion/common/src/hash_utils.rs:
##########
@@ -484,6 +485,40 @@ fn hash_fixed_list_array(
     Ok(())
 }
 
+#[cfg(not(feature = "force_hash_collisions"))]
+fn hash_run_array<R: RunEndIndexType>(
+    array: &RunArray<R>,
+    random_state: &RandomState,
+    hashes_buffer: &mut [u64],
+    rehash: bool,
+) -> Result<()> {
+    let values = array.values();
+    let values_len = values.len();
+    let mut values_hashes = vec![0u64; values_len];
+    create_hashes(&[Arc::clone(values)], random_state, &mut values_hashes)?;
+
+    let run_ends = array.run_ends();
+    let mut prev_run_end = 0;
+
+    for (i, value_hash) in values_hashes.iter().enumerate().take(values_len) {
+        let run_end = run_ends.values()[i].as_usize();

Review Comment:
   This may fail for a sliced RunArray.
   Sliced RunArray has the same `values` but sliced `run_ends`, so their 
indices do not match 1:1.
   
https://docs.rs/arrow-array/latest/src/arrow_array/array/run_array.rs.html#247-253



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to