Re: [PR] refactor: remove `arrow-ord` dependency in `arrow-cast` [arrow-rs]

via GitHub Tue, 28 Oct 2025 01:23:53 -0700


Weijun-H commented on PR #8716:
URL: https://github.com/apache/arrow-rs/pull/8716#issuecomment-3455136174


   After several rounds of optimization, the current version delivers 
significant improvements over the previous one.
    - Type-specialized dispatch:
   `compute_run_boundaries` now routes each physical layout (boolean, primitive 
scalars, binary/string, etc.) to a dedicated helper, allowing most arrays to 
bypass the slow, generic `ArrayData` comparison path.
    - Chunked primitive scanning:
   The no-null primitive path uses scan_run_end, which compares 16 bytes at a 
time via u128 loads. When a chunk differs, it falls back to scalar 
iteration—reducing branches and bounds checks in the hot loop.
    - Targeted use of unsafe for performance:
   Tight loops leverage `get_unchecked`, `from_raw_parts`, and `read_unaligned` 
to eliminate redundant bounds and alignment checks. Each unsafe block includes 
detailed safety comments describing the invariants upheld.
    - RunBoundaryAccumulator:
   A lightweight helper that pre-allocates capacity using a `len / 64 + 2` 
heuristic and expands as needed. All run-detection routines share this 
consistent and efficient allocation strategy.
    - Integrated null handling:
   Boolean, primitive, and binary paths now detect value and validity 
transitions in a single scan, avoiding temporary bitmap construction for null 
detection.
    - Generic fallback:
   Less common types still rely on `ArrayData` equality but reuse the shared 
accumulator to produce consistent run and value outputs—without special-casing 
memory management.
   
   
   ```
   cast string single run to ree<int32>
                           time:   [23.143 µs 23.180 µs 23.224 µs]
                           change: [−8.5926% −6.6138% −5.2622%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     9 (9.00%) high severe
   
   cast runs of 10 string to ree<int32>
                           time:   [4.4857 µs 4.4924 µs 4.4999 µs]
                           change: [−35.582% −32.807% −30.598%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   cast runs of 1000 int32s to ree<int32>
                           time:   [1.9651 µs 1.9923 µs 2.0449 µs]
                           change: [−35.958% −34.582% −33.095%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   cast no runs of int32s to ree<int32>
                           time:   [27.745 µs 28.013 µs 28.291 µs]
                           change: [−27.957% −27.305% −26.645%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     14 (14.00%) high mild
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] refactor: remove `arrow-ord` dependency in `arrow-cast` [arrow-rs]

Reply via email to