(arrow-rs) branch main updated: Speed up string view comparison (up to 3x) (#9250)

dheres Fri, 23 Jan 2026 08:15:04 -0800

This is an automated email from the ASF dual-hosted git repository.

dheres pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-rs.git



The following commit(s) were added to refs/heads/main by this push:
     new 0258ec0aa2 Speed up string view comparison (up to 3x) (#9250)
0258ec0aa2 is described below

commit 0258ec0aa2bd8acae9df41f44b75699c335cbfa1
Author: Daniël Heres <[email protected]>
AuthorDate: Fri Jan 23 17:14:50 2026 +0100

    Speed up string view comparison (up to 3x) (#9250)
    
    # Which issue does this PR close?
    
    <!--
    We generally require a GitHub issue to be filed for all bug fixes and
    enhancements and this helps us generate change logs for our releases.
    You can link an issue to this PR using the GitHub syntax.
    -->
    
    - Closes #9253
    
    # Rationale for this change
    
    I saw this was a bit hot in profiles.
    
    <details>
    
    ```
    lt scalar StringViewArray
                            time:   [18.997 ms 19.046 ms 19.122 ms]
                            change: [−46.019% −45.725% −45.407%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 2 outliers among 100 measurements (2.00%)
      1 (1.00%) high mild
      1 (1.00%) high severe
    
    eq scalar StringViewArray 4 bytes
                            time:   [4.1758 ms 4.2382 ms 4.3220 ms]
                            change: [−65.309% −64.797% −64.047%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 5 outliers among 100 measurements (5.00%)
      1 (1.00%) high mild
      4 (4.00%) high severe
    
    eq scalar StringViewArray 6 bytes
                            time:   [4.1705 ms 4.1841 ms 4.2011 ms]
                            change: [−65.384% −65.155% −64.926%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 2 outliers among 100 measurements (2.00%)
      1 (1.00%) high mild
      1 (1.00%) high severe
    
    eq scalar StringViewArray 13 bytes
                            time:   [6.6653 ms 6.6951 ms 6.7368 ms]
                            change: [−45.014% −44.679% −44.284%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 5 outliers among 100 measurements (5.00%)
      3 (3.00%) high mild
      2 (2.00%) high severe
    
    eq StringViewArray StringViewArray
                            time:   [9.0877 ms 9.1225 ms 9.1626 ms]
                            change: [−29.753% −29.416% −29.044%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 2 outliers among 100 measurements (2.00%)
      1 (1.00%) high mild
      1 (1.00%) high severe
    
    eq StringViewArray StringViewArray inlined bytes
                            time:   [4.7619 ms 4.7771 ms 4.8014 ms]
                            change: [−63.522% −63.288% −63.041%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 19 outliers among 100 measurements (19.00%)
      12 (12.00%) high mild
      7 (7.00%) high severe
    
    lt StringViewArray StringViewArray inlined bytes
                            time:   [12.219 ms 12.308 ms 12.445 ms]
                            change: [−24.196% −23.586% −22.697%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 36 outliers among 100 measurements (36.00%)
      2 (2.00%) low severe
      13 (13.00%) low mild
      2 (2.00%) high mild
      19 (19.00%) high severe
    
    eq long same prefix strings StringViewArray
                            time:   [639.53 µs 642.16 µs 645.97 µs]
                            change: [−27.425% −26.845% −26.240%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 1 outliers among 100 measurements (1.00%)
      1 (1.00%) high severe
    
    neq long same prefix strings StringViewArray
                            time:   [641.40 µs 642.99 µs 644.94 µs]
                            change: [−27.323% −26.696% −26.066%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 9 outliers among 100 measurements (9.00%)
      7 (7.00%) high mild
      2 (2.00%) high severe
    
    lt long same prefix strings StringViewArray
                            time:   [674.98 µs 679.64 µs 687.89 µs]
                            change: [−22.103% −21.388% −20.647%] (p = 0.00 < 
0.05)
                            Performance has improved.
    Found 3 outliers among 100 measurements (3.00%)
      1 (1.00%) high mild
      2 (2.00%) high severe
    ```
    
    </details>
    
    # What changes are included in this PR?
    
    # Are these changes tested?
    
    <!--
    We typically require tests for all PRs in order to:
    1. Prevent the code from being accidentally broken by subsequent changes
    2. Serve as another way to document the expected behavior of the code
    
    If tests are not included in your PR, please explain why (for example,
    are they covered by existing tests)?
    -->
    
    # Are there any user-facing changes?
    
    <!--
    If there are user-facing changes then we may require documentation to be
    updated before approving the PR.
    
    If there are any breaking changes to public APIs, please call them out.
    -->
---
 arrow-array/src/array/byte_view_array.rs |  85 ++---------------
 arrow-ord/src/cmp.rs                     | 159 +++++++++++++++++++++++++++++--
 2 files changed, 160 insertions(+), 84 deletions(-)

diff --git a/arrow-array/src/array/byte_view_array.rs 
b/arrow-array/src/array/byte_view_array.rs
index c517129a17..2d5d207b82 100644
--- a/arrow-array/src/array/byte_view_array.rs
+++ b/arrow-array/src/array/byte_view_array.rs
@@ -793,85 +793,20 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
     /// key("bar\0") = 0x0000000000000000000062617200000004
     /// ⇒ key("bar") < key("bar\0")
     /// ```
-    /// # Inlining and Endianness
+    /// - `raw` is treated as a 128-bit integer with its bits laid out as 
follows:
+    ///   - bits 0–31: length (little-endian)
+    ///   - bits 32–127: data (little-endian)
     ///
-    /// - We start by calling `.to_le_bytes()` on the `raw` `u128`, because 
Rust’s native in‑memory
-    ///   representation is little‑endian on x86/ARM.
-    /// - We extract the low 32 bits numerically (`raw as u32`)—this step is 
endianness‑free.
-    /// - We copy the 12 bytes of inline data (original order) into 
`buf[0..12]`.
-    /// - We serialize `length` as big‑endian into `buf[12..16]`.
-    /// - Finally, `u128::from_be_bytes(buf)` treats `buf[0]` as the most 
significant byte
-    ///   and `buf[15]` as the least significant, producing a `u128` whose 
integer value
-    ///   directly encodes “inline data then length” in big‑endian form.
+    /// # Inlining and Endianness
     ///
-    /// This ensures that a simple `u128` comparison is equivalent to the 
desired
-    /// lexicographical comparison of the inline bytes followed by length.
+    /// This function uses platform-independent bitwise operations to 
construct a 128-bit key:
+    /// - `raw.swap_bytes() << 32` effectively clears the length bits and 
shifts the 12-byte inline data
+    ///   into the high 96 bits in Big-Endian order. This ensures the first 
byte of the string
+    ///   is the most significant byte of the resulting `u128`.
+    /// - `raw as u32` extracts the length as a numeric integer, which is then 
placed in the low 32 bits.
     #[inline(always)]
     pub fn inline_key_fast(raw: u128) -> u128 {
-        // 1. Decompose `raw` into little‑endian bytes:
-        //    - raw_bytes[0..4]  = length in LE
-        //    - raw_bytes[4..16] = inline string data
-        let raw_bytes = raw.to_le_bytes();
-
-        // 2. Numerically truncate to get the low 32‑bit length 
(endianness‑free).
-        let length = raw as u32;
-
-        // 3. Build a 16‑byte buffer in big‑endian order:
-        //    - buf[0..12]  = inline string bytes (in original order)
-        //    - buf[12..16] = length.to_be_bytes() (BE)
-        let mut buf = [0u8; 16];
-        buf[0..12].copy_from_slice(&raw_bytes[4..16]); // inline data
-
-        // Why convert length to big-endian for comparison?
-        //
-        // Rust (on most platforms) stores integers in little-endian format,
-        // meaning the least significant byte is at the lowest memory address.
-        // For example, an u32 value like 0x22345677 is stored in memory as:
-        //
-        //   [0x77, 0x56, 0x34, 0x22]  // little-endian layout
-        //    ^     ^     ^     ^
-        //  LSB   ↑↑↑           MSB
-        //
-        // This layout is efficient for arithmetic but *not* suitable for
-        // lexicographic (dictionary-style) comparison of byte arrays.
-        //
-        // To compare values by byte order—e.g., for sorted keys or binary 
trees—
-        // we must convert them to **big-endian**, where:
-        //
-        //   - The most significant byte (MSB) comes first (index 0)
-        //   - The least significant byte (LSB) comes last (index N-1)
-        //
-        // In big-endian, the same u32 = 0x22345677 would be represented as:
-        //
-        //   [0x22, 0x34, 0x56, 0x77]
-        //
-        // This ordering aligns with natural string/byte sorting, so calling
-        // `.to_be_bytes()` allows us to construct
-        // keys where standard numeric comparison (e.g., `<`, `>`) behaves
-        // like lexicographic byte comparison.
-        buf[12..16].copy_from_slice(&length.to_be_bytes()); // length in BE
-
-        // 4. Deserialize the buffer as a big‑endian u128:
-        //    buf[0] is MSB, buf[15] is LSB.
-        // Details:
-        // Note on endianness and layout:
-        //
-        // Although `buf[0]` is stored at the lowest memory address,
-        // calling `u128::from_be_bytes(buf)` interprets it as the **most 
significant byte (MSB)**,
-        // and `buf[15]` as the **least significant byte (LSB)**.
-        //
-        // This is the core principle of **big-endian decoding**:
-        //   - Byte at index 0 maps to bits 127..120 (highest)
-        //   - Byte at index 1 maps to bits 119..112
-        //   - ...
-        //   - Byte at index 15 maps to bits 7..0 (lowest)
-        //
-        // So even though memory layout goes from low to high (left to right),
-        // big-endian treats the **first byte** as highest in value.
-        //
-        // This guarantees that comparing two `u128` keys is equivalent to 
lexicographically
-        // comparing the original inline bytes, followed by length.
-        u128::from_be_bytes(buf)
+        (raw.swap_bytes() << 32) | (raw as u32 as u128)
     }
 }
 
diff --git a/arrow-ord/src/cmp.rs b/arrow-ord/src/cmp.rs
index 30943ede4a..18e7e9e8fb 100644
--- a/arrow-ord/src/cmp.rs
+++ b/arrow-ord/src/cmp.rs
@@ -573,36 +573,84 @@ impl<'a, T: ByteViewType> ArrayOrd for &'a 
GenericByteViewArray<T> {
             return l_view == r_view;
         }
 
+        // Fast path for same view (and both inlined)
+        if l_view == r_view && *l_view as u32 <= 12 {
+            return true;
+        }
+
         let l_len = *l_view as u32;
         let r_len = *r_view as u32;
-        // This is a fast path for equality check.
-        // We don't need to look at the actual bytes to determine if they are 
equal.
+        // Lengths differ
         if l_len != r_len {
             return false;
         }
-        if l_len == 0 && r_len == 0 {
+
+        // Both are empty
+        if l_len == 0 {
             return true;
         }
 
+        // Check prefix
+        if (*l_view >> 32) as u32 != (*r_view >> 32) as u32 {
+            return false;
+        }
+
+        // Both are inlined, and prefixes are equal (so they differ in rest of 
inlined bytes)
+        if l_len <= 12 {
+            return false;
+        }
+
         // # Safety
         // The index is within bounds as it is checked in value()
-        unsafe { GenericByteViewArray::compare_unchecked(l.0, l.1, r.0, 
r.1).is_eq() }
+        unsafe {
+            let l_buffer_idx = (*l_view >> 64) as u32;
+            let l_offset = (*l_view >> 96) as u32;
+            let r_buffer_idx = (*r_view >> 64) as u32;
+            let r_offset = (*r_view >> 96) as u32;
+
+            let l_data = l.0.data_buffers().get_unchecked(l_buffer_idx as 
usize);
+            let r_data = r.0.data_buffers().get_unchecked(r_buffer_idx as 
usize);
+
+            let l_slice = l_data
+                .as_slice()
+                .get_unchecked(l_offset as usize..(l_offset + l_len) as usize);
+            let r_slice = r_data
+                .as_slice()
+                .get_unchecked(r_offset as usize..(r_offset + r_len) as usize);
+            l_slice == r_slice
+        }
     }
 
     #[inline(always)]
     fn is_lt(l: Self::Item, r: Self::Item) -> bool {
-        // If both arrays use only the inline buffer
+        let l_view = unsafe { l.0.views().get_unchecked(l.1) };
+        let r_view = unsafe { r.0.views().get_unchecked(r.1) };
+
         if l.0.data_buffers().is_empty() && r.0.data_buffers().is_empty() {
-            let l_view = unsafe { l.0.views().get_unchecked(l.1) };
-            let r_view = unsafe { r.0.views().get_unchecked(r.1) };
+            // For lt case, we can directly compare the inlined bytes
+            return GenericByteViewArray::<T>::inline_key_fast(*l_view)
+                < GenericByteViewArray::<T>::inline_key_fast(*r_view);
+        }
+
+        if (*l_view as u32) <= 12 && (*r_view as u32) <= 12 {
             return GenericByteViewArray::<T>::inline_key_fast(*l_view)
                 < GenericByteViewArray::<T>::inline_key_fast(*r_view);
         }
 
-        // Fallback to the generic, unchecked comparison for non-inline cases
+        let l_prefix = (*l_view >> 32) as u32;
+        let r_prefix = (*r_view >> 32) as u32;
+        if l_prefix != r_prefix {
+            return l_prefix.swap_bytes() < r_prefix.swap_bytes();
+        }
+
+        // Fallback to the generic, unchecked comparison for mixed cases
         // # Safety
         // The index is within bounds as it is checked in value()
-        unsafe { GenericByteViewArray::compare_unchecked(l.0, l.1, r.0, 
r.1).is_lt() }
+        unsafe {
+            let l_data: &[u8] = l.0.value_unchecked(l.1).as_ref();
+            let r_data: &[u8] = r.0.value_unchecked(r.1).as_ref();
+            l_data < r_data
+        }
     }
 
     fn len(&self) -> usize {
@@ -815,4 +863,97 @@ mod tests {
 
         neq(&col.slice(0, col.len() - 1), &col.slice(1, col.len() - 
1)).unwrap();
     }
+
+    #[test]
+    fn test_string_view_mixed_lt() {
+        let a = arrow_array::StringViewArray::from(vec![
+            Some("apple"),
+            Some("apple"),
+            Some("apple_long_string"),
+        ]);
+        let b = arrow_array::StringViewArray::from(vec![
+            Some("apple_long_string"),
+            Some("appl"),
+            Some("apple"),
+        ]);
+        // "apple" < "apple_long_string" -> true
+        // "apple" < "appl" -> false
+        // "apple_long_string" < "apple" -> false
+        assert_eq!(
+            lt(&a, &b).unwrap(),
+            BooleanArray::from(vec![true, false, false])
+        );
+    }
+
+    #[test]
+    fn test_string_view_eq() {
+        let a = arrow_array::StringViewArray::from(vec![
+            Some("hello"),
+            Some("world"),
+            None,
+            Some("very long string exceeding 12 bytes"),
+        ]);
+        let b = arrow_array::StringViewArray::from(vec![
+            Some("hello"),
+            Some("world"),
+            None,
+            Some("very long string exceeding 12 bytes"),
+        ]);
+        assert_eq!(
+            eq(&a, &b).unwrap(),
+            BooleanArray::from(vec![Some(true), Some(true), None, Some(true)])
+        );
+
+        let c = arrow_array::StringViewArray::from(vec![
+            Some("hello"),
+            Some("world!"),
+            None,
+            Some("very long string exceeding 12 bytes!"),
+        ]);
+        assert_eq!(
+            eq(&a, &c).unwrap(),
+            BooleanArray::from(vec![Some(true), Some(false), None, 
Some(false)])
+        );
+    }
+
+    #[test]
+    fn test_string_view_lt() {
+        let a = arrow_array::StringViewArray::from(vec![
+            Some("apple"),
+            Some("banana"),
+            Some("very long apple exceeding 12 bytes"),
+            Some("very long banana exceeding 12 bytes"),
+        ]);
+        let b = arrow_array::StringViewArray::from(vec![
+            Some("banana"),
+            Some("apple"),
+            Some("very long banana exceeding 12 bytes"),
+            Some("very long apple exceeding 12 bytes"),
+        ]);
+        assert_eq!(
+            lt(&a, &b).unwrap(),
+            BooleanArray::from(vec![true, false, true, false])
+        );
+    }
+
+    #[test]
+    fn test_compare_byte_view() {
+        let a = arrow_array::StringViewArray::from(vec![
+            Some("apple"),
+            Some("banana"),
+            Some("very long apple exceeding 12 bytes"),
+            Some("very long banana exceeding 12 bytes"),
+        ]);
+        let b = arrow_array::StringViewArray::from(vec![
+            Some("apple"),
+            Some("apple"),
+            Some("very long apple exceeding 12 bytes"),
+            Some("very long apple exceeding 12 bytes"),
+        ]);
+
+        assert_eq!(compare_byte_view(&a, 0, &b, 0), Ordering::Equal);
+        assert_eq!(compare_byte_view(&a, 1, &b, 1), Ordering::Greater);
+        assert_eq!(compare_byte_view(&a, 2, &b, 2), Ordering::Equal);
+        assert_eq!(compare_byte_view(&a, 3, &b, 3), Ordering::Greater);
+    }
 }

(arrow-rs) branch main updated: Speed up string view comparison (up to 3x) (#9250)

Reply via email to