neilconway opened a new pull request, #20657: URL: https://github.com/apache/datafusion/pull/20657
## Which issue does this PR close? - Closes #20655. ## Rationale for this change `lpad` and `rpad` are commonly called with constant (scalar) target length and fill arguments, e.g. `lpad(column, 20, '0')`. We can take special-case this scenario to improve performance by avoiding the overhead of `make_scalar_function`, and also by precomputing the padding buffer and reusing it for each row. For scalar args, this improves performance by ~65% for ASCII inputs and ~41% for Unicode inputs. ## What changes are included in this PR? - Add benchmarks for padding with scalar length and fill. - Add a scalar fast path for `lpad` and `rpad` that precomputes a padding buffer. - Code cleanup: extract and use `try_as_scalar_str` and `try_as_scalar_i64` helpers. - Code cleanup: make rpad and lpad more similar by removing needless variation between the two implementations. We could go further and refactor them to remove the redundancy but I won't attempt that for now. ## Are these changes tested? Yes; covered by existing tests. Added new benchmarks. ## Are there any user-facing changes? No. ## AI usage Multiple AI tools were used to iterate on this PR. I have reviewed and understand the resulting code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
