tustvold commented on code in PR #3622:
URL: https://github.com/apache/arrow-rs/pull/3622#discussion_r1090525144


##########
arrow-select/src/take.rs:
##########
@@ -810,6 +816,70 @@ where
     Ok(DictionaryArray::<T>::from(data))
 }
 
+macro_rules! primitive_run_take {
+    ($t:ty, $o:ty, $indices:ident, $value:ident) => {
+        take_primitive_run_values::<$o, $t>(
+            $indices,
+            as_primitive_array::<$t>($value.values()),
+        )
+    };
+}
+
+/// `take` implementation for run arrays
+///
+/// performs binary search on `run_ends` to get physical indices for the given 
logical indices.
+/// builds output run array by taking values in the input run array at the 
physical indices.
+/// for e.g. an input `RunArray{ run_ends = [2,4,6,8], values=[1,2,1,2] }` and 
`indices=[2,7]`
+/// would be converted to `physical_indices=[1,3]` which will be used to build
+/// output `RunArray{ run_ends=[2], values=[2] }`
+
+pub fn take_run<T, I>(
+    run_array: &RunArray<T>,
+    logical_indices: &PrimitiveArray<I>,
+) -> Result<RunArray<T>, ArrowError>
+where
+    T: RunEndIndexType,
+    T::Native: num::Num,
+    I: ArrowPrimitiveType,
+    I::Native: ToPrimitive,
+{
+    match run_array.data_type() {
+        DataType::RunEndEncoded(_, fl) => {
+            let physical_indices =
+                run_array.get_physical_indices(logical_indices.values())?;
+            downcast_primitive! {
+                fl.data_type() => (primitive_run_take, T, physical_indices, 
run_array),
+                dt => Err(ArrowError::NotYetImplemented(format!("take_run is 
not implemented for {dt:?}")))
+            }
+        }
+        dt => Err(ArrowError::InvalidArgumentError(format!(
+            "Expected DataType::RunEndEncoded found {dt:?}"
+        ))),
+    }
+}
+// Builds a `RunArray` by taking values from given array for the given indices.
+fn take_primitive_run_values<R, V>(

Review Comment:
   Yeah, this wasn't quite what I was suggesting, I'll try to find sometime to 
bash out what I mean later in the week. In particular this is still typed on 
the values, which is what I want to avoid.
   
   I want to first build the `new_run_ends` and the corresponding physical 
indices (as a PrimitiveArray) and then use the `take` kernel with the physical 
indices to construct the child data. This generalises trivially to all value 
types, and does not need to use `PrimitiveRunBuilder` at all. It should also be 
faster as it does not need to do value comparisons



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to