[PR] perf: Optimize lpad, rpad for ASCII strings [datafusion]

via GitHub Tue, 10 Feb 2026 11:47:15 -0800


neilconway opened a new pull request, #20278:
URL: https://github.com/apache/datafusion/pull/20278


   The previous implementation incurred the overhead of Unicode machinery, even 
for the common case that both the input string and the fill string consistent 
only of ASCII characters. For the ASCII-only case, we can assume that the 
length in bytes equals the length in characters, and avoid expensive 
graphene-based segmentation. This follows similar optimizations applied 
elsewhere in the codebase.
   
   Benchmarks indicate this is a significant performance win for ASCII-only 
input (4x-10x faster) but only a mild regression for Unicode input (2-5% 
slower).
   
   Along the way:
   
   * Combine: a few instances of `write_str(str)? + append_value("")` with 
`append_value(str)`, which saves a few cycles
   * Add a missing test case for truncating the input string
   * Add benchmarks for Unicode input
   
   ## Which issue does this PR close?
   
   - Closes #20277.
   
   ## Are these changes tested?
   
   Covered by existing tests. Added new benchmarks for Unicode inputs.
   
   ## Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: Optimize lpad, rpad for ASCII strings [datafusion]

Reply via email to