Rich-T-kid commented on code in PR #9856:
URL: https://github.com/apache/arrow-rs/pull/9856#discussion_r3189957425
##########
arrow-select/src/interleave.rs:
##########
@@ -411,6 +417,70 @@ fn interleave_list<O: OffsetSizeTrait>(
Ok(Arc::new(list_array))
}
+/// Specialized [`interleave`] for [`RunArray`].
+fn interleave_run_end<R: RunEndIndexType>(
+ values: &[&dyn Array],
+ indices: &[(usize, usize)],
+) -> Result<ArrayRef, ArrowError> {
+ if indices.is_empty() {
+ return Ok(new_empty_array(values[0].data_type()));
+ }
+
+ let n = indices.len();
+ R::Native::from_usize(n).ok_or_else(|| {
+ ArrowError::ComputeError(format!(
+ "interleave_run_end: output length {n} does not fit run-end type"
+ ))
+ })?;
+
+ let runs: Vec<&RunArray<R>> = values.iter().map(|a|
a.as_run::<R>()).collect();
+ let value_arrays: Vec<&dyn Array> = runs.iter().map(|r|
r.values().as_ref()).collect();
+
+ // Resolve each (array, logical_row) to (array, physical_row), so we can
+ // lookup physical indices by batch.
+ let mut phys_pairs: Vec<(usize, usize)> = vec![(0, 0); n];
+ let mut grouped: Vec<(Vec<R::Native>, Vec<usize>)> =
+ (0..runs.len()).map(|_| (Vec::new(), Vec::new())).collect();
+ for (out_pos, &(arr, row)) in indices.iter().enumerate() {
+ let row = R::Native::from_usize(row).ok_or_else(|| {
+ ArrowError::InvalidArgumentError(format!(
+ "interleave_run_end: row index {row} out of range"
Review Comment:
Im also a bit confused about this check, your checking if the row is out of
bounds but couldn't you do this by checking the size of the array like
`let current_array = values_arrays[arr]`
`if current_array.len() >= row { return arrow error( "row index {row} out
of range"}
`
for example
` let mut builder = PrimitiveRunBuilder::<Int16Type, Int16Type>::new();
builder.extend([0, 0, 0, 1, 1, 0, 0, 1, 1, 1].into_iter().map(Some));
let a = builder.finish();
let mut builder = PrimitiveRunBuilder::<Int16Type, Int16Type>::new();
builder.extend([2, 2, 1, 1, 1, 0, 1, 0, 0, 0].into_iter().map(Some));
let b = builder.finish();
// logical: [1, 1, 1, 1, 1] across an a→b boundary; should compact
to one run.
// greater than int16::max
let result = interleave(&[&a, &b], &[(0, 32766), (0, 4), (1, 2), (1,
3), (1, 4)]).unwrap();
let result = result.as_run::<Int16Type>();`
This code returns an error but the error comes from the call to
**get_physical_indices**()
` let phys = runs[arr_idx].get_physical_indices(&logical_rows)?;` not
the validation step that your doing within the loop.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]