[GitHub] [arrow-datafusion] isidentical commented on issue #3518: Improve performance of `regex_replace`

GitBox Sat, 01 Oct 2022 11:53:03 -0700


isidentical commented on issue #3518:
URL: 
https://github.com/apache/arrow-datafusion/issues/3518#issuecomment-1264450789


   > reusing the null buffer from the input array (instead of rebuilding in the 
iterator)
   
   I was looking into this one; but even on cases where the array is very 
sparse (in testings, I've used %20 data / %80 nulls), there doesn't seem to be 
an observable difference between building the underlying data buffers by 
ourselves and reusing the existing null buffer vs rebuilding everything. The 
experiment is [in this 
branch](https://github.com/isidentical/arrow-datafusion/pull/3/files), and the 
results are (on release mode):
   - baseline is ~0.721s ish
   - that branch is ~0.690s ish
   
   So a ~5% speed-up, but I highly suspect it might be just noise. (There is 
also a chance that I completely misunderstood the concept 😄 so am very open to 
input on the experiment code on what else I can try).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] isidentical commented on issue #3518: Improve performance of `regex_replace`

Reply via email to