pepijnve opened a new pull request, #23057: URL: https://github.com/apache/datafusion/pull/23057
## Which issue does this PR close? - None yet ## Rationale for this change During recent profiling work string concatenation proved to be a hotspot. Investigation of the current kernel implementation for string views showed that there was still some room for improvement. - Preallocating the exact size of the required output buffers can avoid reallocations. - By copying data directly to the final data buffer a memcpy from the temp buffer can be avoided. Together this can result in ~30% improvement per the string concat benchmark Note that this work is a port of https://github.com/apache/arrow-rs/pull/10161. Ideally the implementation from Arrow is used by DataFusion once the PR in that project is merged and released. Since DataFusion currently uses a custom kernel it seemed to make sense to temporarily port the proposed PR from Arrow. ## What changes are included in this PR? - Rewrite the byte view concatenation kernels ## Are these changes tested? Covered by existing tests ## Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
