Dandandan opened a new pull request, #21378:
URL: https://github.com/apache/datafusion/pull/21378

   ## Which issue does this PR close?
   
   N/A - Performance improvement
   
   ## Rationale for this change
   
   The `character_length` UDF used a generic `StringArrayType` trait 
implementation that accessed string data even when only byte lengths were 
needed (ASCII case). This adds unnecessary memory access overhead.
   
   ## What changes are included in this PR?
   
   Replaces the generic implementation with type-specific optimized versions:
   
   - **Utf8/LargeUtf8 ASCII path**: Computes lengths directly from the offsets 
buffer without touching the string data at all
   - **StringView ASCII path**: Reads string lengths from the view metadata 
(first 4 bytes of each 128-bit view)
   - **Non-ASCII paths**: Unchanged, still uses `chars().count()`
   
   ## Are these changes tested?
   
   Existing tests pass. Benchmarked with `cargo bench --bench character_length`:
   
   | Benchmark | Before | After | Speedup |
   |---|---|---|---|
   | StringArray ASCII 8 | 5.81 µs | 3.06 µs | **1.9x** |
   | StringArray ASCII 32 | 12.91 µs | 10.11 µs | **1.28x** |
   | StringArray ASCII 128 | 40.48 µs | 38.20 µs | **1.06x** |
   | StringArray UTF8 8 | 59.8 µs | 48.6 µs | **1.23x** |
   | StringArray UTF8 32 | 97.9 µs | 82.3 µs | **1.19x** |
   | StringArray UTF8 128 | 145.5 µs | 134.3 µs | **1.08x** |
   | StringViewArray ASCII 8 | 15.75 µs | 13.96 µs | **1.13x** |
   | StringViewArray ASCII 32 | 24.62 µs | 23.46 µs | **1.05x** |
   | StringViewArray UTF8 128 | 166.4 µs | 159.0 µs | **1.05x** |
   
   No regressions observed.
   
   ## Are there any user-facing changes?
   
   No.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to