isidentical commented on PR #3765:
URL: 
https://github.com/apache/arrow-datafusion/pull/3765#issuecomment-1274956304

   > I wonder if you have any benchmark results that show this pattern 
improving performance?
   
   Yep, I was playing with the idea of using the existing null buffers for 
`regex_replace` when I noticed that almost ~30% of the time was spent on here 
(if the input is a sparse-ish array). The example here 
https://github.com/isidentical/arrow-datafusion/pull/3 shows about ~%20-25 
speed-up, and if we go and make it even more sparse the speed-up factor can 
increase up to %30-%35 (since there isn't much regex processing, all the time 
was spent on padding arrays with data we won't be using at all).
   
   Also a subsequent optimization (using the existing null buffers) was 
unnoticeable before this change, but now we can clearly see the benefit of it 
since the actual time is now spent on result array construction inside 
`regex_replace` (instead of array padding on the scalar adapter).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to