neilconway opened a new pull request, #20635: URL: https://github.com/apache/datafusion/pull/20635
## Which issue does this PR close? - Closes #20634. ## Rationale for this change The current `to_char` implementation (both scalar and array paths) allocates a new string for every input row to hold the result of the `ArrayFormatter`. To produce the results, it uses `StringArray::from`, which copies. We can do better by using a reusable buffer to store the result of `ArrayFormatter`, and then append to `StringBuilder`. We can then construct the final result values from the `StringBuilder` very efficiently. This yields a 10-20% improvement in the `to_char` microbenchmarks. Note that we still do an unnecessary copy from the reusable buffer to the `StringBuilder`; in principle, we could arrange for the `ArrayFormatter` to write into the `StringBuilder`'s buffer directly. But that would require either changes in Arrow or inventing our own construct, so I'll defer that for now. This PR also cleans up various code in `to_char`, and fixes a bug in `NULL` handling: in the array case, if the current data value is NULL but the format string is non-NULL, we incorrectly returned an empty string instead of NULL. ## What changes are included in this PR? * Optimize `to_char` (scalar and array paths) as described above * Fix bug in NULL handling * Add SLT test case for NULL handling bug * Simplify and refactor various parts of `to_char`, particularly around error handling ## Are these changes tested? Yes. Benchmarked and added new test. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
