alamb commented on code in PR #8716:
URL: https://github.com/apache/arrow-rs/pull/8716#discussion_r2466791218


##########
arrow-cast/src/cast/run_array.rs:
##########
@@ -134,16 +134,8 @@ pub(crate) fn cast_to_run_end_encoded<K: RunEndIndexType>(
         ));
     }
 
-    // Partition the array to identify runs of consecutive equal values
-    let partitions = partition(&[Arc::clone(cast_array)])?;
-    let mut run_ends = Vec::new();

Review Comment:
   I looked briefly at a profile for this function -- I think we could make it 
substantially faster by reducing allocatiosn with a pre-sized vector here (use 
partitions.count_ones() to know how many partitions are needed)
   
   <img width="1748" height="712" alt="Screenshot 2025-10-27 at 3 05 53 PM" 
src="https://github.com/user-attachments/assets/8ca8e12c-ed08-4076-b47e-5e80f7a02c2b";
 />
   



##########
arrow-cast/src/cast/run_array.rs:
##########
@@ -162,3 +154,23 @@ pub(crate) fn cast_to_run_end_encoded<K: RunEndIndexType>(
     let run_array = RunArray::<K>::try_new(&run_ends_array, 
values_array.as_ref())?;
     Ok(Arc::new(run_array))
 }
+
+fn compute_run_boundaries(array: &ArrayRef) -> (Vec<usize>, Vec<usize>) {
+    let mut run_ends = Vec::new();
+    let mut values_indexes = Vec::new();
+    if array.is_empty() {
+        return (run_ends, values_indexes);
+    }
+    values_indexes.push(0);
+    let mut current_data = array.slice(0, 1).to_data();
+    for idx in 1..array.len() {
+        let next_data = array.slice(idx, 1).to_data();

Review Comment:
   I think this is likely to be substantially slower than what `partition` 
does, but we can see what the benchmarks show



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to