neilconway opened a new pull request, #20657:
URL: https://github.com/apache/datafusion/pull/20657

   ## Which issue does this PR close?
   
   - Closes #20655.
   
   ## Rationale for this change
   
   `lpad` and `rpad` are commonly called with constant (scalar) target length 
and fill arguments, e.g. `lpad(column, 20, '0')`. We can take special-case this 
scenario to improve performance by avoiding the overhead of 
`make_scalar_function`, and also by precomputing the padding buffer and reusing 
it for each row.
   
   For scalar args, this improves performance by ~65% for ASCII inputs and ~41% 
for Unicode inputs.
   
   ## What changes are included in this PR?
   
   - Add benchmarks for padding with scalar length and fill.
   - Add a scalar fast path for `lpad` and `rpad` that precomputes a padding 
buffer.
   - Code cleanup: extract and use `try_as_scalar_str` and `try_as_scalar_i64` 
helpers.
   - Code cleanup: make rpad and lpad more similar by removing needless 
variation between the two implementations. We could go further and refactor 
them to remove the redundancy but I won't attempt that for now.
   
   ## Are these changes tested?
   
   Yes; covered by existing tests. Added new benchmarks.
   
   ## Are there any user-facing changes?
   
   No.
   
   ## AI usage
   
   Multiple AI tools were used to iterate on this PR. I have reviewed and 
understand the resulting code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to