martin-g commented on code in PR #18981:
URL: https://github.com/apache/datafusion/pull/18981#discussion_r2571542572
##########
datafusion/common/src/cast.rs:
##########
@@ -334,3 +335,8 @@ pub fn as_list_view_array(array: &dyn Array) ->
Result<&ListViewArray> {
pub fn as_large_list_view_array(array: &dyn Array) ->
Result<&LargeListViewArray> {
Ok(downcast_value!(array, LargeListViewArray))
}
+
+// Downcast ArrayRef to RunArray
+pub fn as_run_array<R: RunEndIndexType>(array: &dyn Array) ->
Result<&RunArray<R>> {
Review Comment:
Is this function still needed ?
Its usage has been replaced with `downcast_run_array!()` and now it is not
used anywhere.
##########
datafusion/common/src/hash_utils.rs:
##########
@@ -484,6 +485,40 @@ fn hash_fixed_list_array(
Ok(())
}
+#[cfg(not(feature = "force_hash_collisions"))]
+fn hash_run_array<R: RunEndIndexType>(
+ array: &RunArray<R>,
+ random_state: &RandomState,
+ hashes_buffer: &mut [u64],
+ rehash: bool,
+) -> Result<()> {
+ let values = array.values();
+ let values_len = values.len();
+ let mut values_hashes = vec![0u64; values_len];
+ create_hashes(&[Arc::clone(values)], random_state, &mut values_hashes)?;
+
+ let run_ends = array.run_ends();
+ let mut prev_run_end = 0;
+
+ for (i, value_hash) in values_hashes.iter().enumerate().take(values_len) {
+ let run_end = run_ends.values()[i].as_usize();
+
+ if rehash {
Review Comment:
The other functions check whether the current value is valid/non-null before
hashing it when `rehash == true`. Is this needed here too ?
##########
datafusion/common/src/hash_utils.rs:
##########
@@ -484,6 +485,40 @@ fn hash_fixed_list_array(
Ok(())
}
+#[cfg(not(feature = "force_hash_collisions"))]
+fn hash_run_array<R: RunEndIndexType>(
+ array: &RunArray<R>,
+ random_state: &RandomState,
+ hashes_buffer: &mut [u64],
+ rehash: bool,
+) -> Result<()> {
+ let values = array.values();
+ let values_len = values.len();
+ let mut values_hashes = vec![0u64; values_len];
+ create_hashes(&[Arc::clone(values)], random_state, &mut values_hashes)?;
+
+ let run_ends = array.run_ends();
+ let mut prev_run_end = 0;
+
+ for (i, value_hash) in values_hashes.iter().enumerate().take(values_len) {
+ let run_end = run_ends.values()[i].as_usize();
Review Comment:
This may fail for a sliced RunArray.
Sliced RunArray has the same `values` but sliced `run_ends`, so their
indices do not match 1:1.
https://docs.rs/arrow-array/latest/src/arrow_array/array/run_array.rs.html#247-253
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]