isidentical commented on PR #3765: URL: https://github.com/apache/arrow-datafusion/pull/3765#issuecomment-1274956304
> I wonder if you have any benchmark results that show this pattern improving performance? Yep, I was playing with the idea of using the existing null buffers for `regex_replace` when I noticed that almost ~30% of the time was spent on here (if the input is a sparse-ish array). The example here https://github.com/isidentical/arrow-datafusion/pull/3 shows about ~%20-25 speed-up, and if we go and make it even more sparse the speed-up factor can increase up to %30-%35 (since there isn't much regex processing, all the time was spent on padding arrays with data we won't be using at all). Also a subsequent optimization (using the existing null buffers) was unnoticeable before this change, but now we can clearly see the benefit of it since the actual time is now spent on result array construction inside `regex_replace` (instead of array padding on the scalar adapter). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
