brunal commented on code in PR #9621:
URL: https://github.com/apache/arrow-rs/pull/9621#discussion_r3000474728


##########
arrow-ord/src/cmp.rs:
##########
@@ -701,10 +828,37 @@ pub fn compare_byte_view<T: ByteViewType>(
     unsafe { GenericByteViewArray::compare_unchecked(left, left_idx, right, 
right_idx) }
 }
 
+/// Run-end encoding metadata for one side of a comparison. Only stores the
+/// physical run-end positions (m elements), not a logical-length index vector.
+struct ReeInfo {
+    run_ends: Vec<usize>,
+    offset: usize,
+    start_physical: usize,
+    len: usize,
+}
+
+/// If `array` is RunEndEncoded, return its physical values array and run 
metadata.
+fn ree_unwrap(array: &dyn Array) -> Option<(&dyn Array, ReeInfo)> {

Review Comment:
   I don't like the unwrap name too much as there no `unwrap()` call :)



##########
arrow-ord/src/cmp.rs:
##########
@@ -479,6 +570,42 @@ fn apply_op_vectored<T: ArrayOrd>(
     })
 }
 
+/// Segment-based comparison for two REE arrays. Walks both run_ends arrays
+/// simultaneously, comparing once per aligned segment and bulk-filling.
+fn apply_op_segments<T: ArrayOrd>(
+    l: T,
+    l_info: &ReeInfo,
+    r: T,
+    r_info: &ReeInfo,
+    neg: bool,
+    op: fn(T::Item, T::Item) -> bool,
+) -> BooleanBuffer {
+    let end = l_info.offset + l_info.len;
+    let mut builder = BooleanBufferBuilder::new(l_info.len);
+    let mut l_phys = l_info.start_physical;
+    let mut r_phys = r_info.start_physical;
+    let mut pos = l_info.offset;
+
+    while pos < end {
+        let l_end = l_info.run_ends[l_phys].min(end);

Review Comment:
   you could also go unsafe here imho



##########
arrow-ord/src/cmp.rs:
##########
@@ -701,10 +828,37 @@ pub fn compare_byte_view<T: ByteViewType>(
     unsafe { GenericByteViewArray::compare_unchecked(left, left_idx, right, 
right_idx) }
 }
 
+/// Run-end encoding metadata for one side of a comparison. Only stores the
+/// physical run-end positions (m elements), not a logical-length index vector.
+struct ReeInfo {
+    run_ends: Vec<usize>,
+    offset: usize,
+    start_physical: usize,
+    len: usize,
+}
+
+/// If `array` is RunEndEncoded, return its physical values array and run 
metadata.
+fn ree_unwrap(array: &dyn Array) -> Option<(&dyn Array, ReeInfo)> {
+    downcast_run_array!(
+        array => {
+            let run_ends = array.run_ends();
+            let info = ReeInfo {
+                run_ends: run_ends.values().iter().map(|v| 
v.as_usize()).collect(),

Review Comment:
   It's a bit annoying that you're copying all the indices into a vec. No 
convenient way to have a `&dyn` like for dicts?



##########
arrow-ord/src/cmp.rs:
##########
@@ -327,28 +336,41 @@ fn compare_op(op: Op, lhs: &dyn Datum, rhs: &dyn Datum) 
-> Result<BooleanArray,
 }
 
 /// Perform a potentially vectored `op` on the provided `ArrayOrd`
+#[allow(clippy::too_many_arguments)]
 fn apply<T: ArrayOrd>(
     op: Op,
     l: T,
     l_s: bool,

Review Comment:
   Unless I'm mistaken, you can have only one of l_s == true / l_v.is_some() / 
l_ree.is_some() at a time. How about introducing an enum? It will reduce the 
arg # and make them clearer



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to