pepijnve opened a new pull request, #23057:
URL: https://github.com/apache/datafusion/pull/23057

   ## Which issue does this PR close?
   
   - None yet
   
   ## Rationale for this change
   
   During recent profiling work string concatenation proved to be a hotspot. 
Investigation of the current kernel implementation for string views showed that 
there was still some room for improvement.
   
   - Preallocating the exact size of the required output buffers can avoid 
reallocations.
   - By copying data directly to the final data buffer a memcpy from the temp 
buffer can be avoided.
   
   Together this can result in ~30% improvement per the string concat benchmark
   
   Note that this work is a port of 
https://github.com/apache/arrow-rs/pull/10161. Ideally the implementation from 
Arrow is used by DataFusion once the PR in that project is merged and released. 
Since DataFusion currently uses a custom kernel it seemed to make sense to 
temporarily port the proposed PR from Arrow.
   
   ## What changes are included in this PR?
   
   - Rewrite the byte view concatenation kernels
   
   ## Are these changes tested?
   
   Covered by existing tests
   
   ## Are there any user-facing changes?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to