neilconway commented on code in PR #21535:
URL: https://github.com/apache/datafusion/pull/21535#discussion_r3065503747
##########
datafusion/physical-expr/src/expressions/binary/kernels.rs:
##########
@@ -177,24 +178,27 @@ pub fn concat_elements_utf8view(
right.len()
)));
}
- let capacity = left.len();
- let mut result = StringViewBuilder::with_capacity(capacity);
+ let mut result = StringViewBuilder::with_capacity(left.len());
- // Avoid reallocations by writing to a reused buffer (note we
- // could be even more efficient r by creating the view directly
- // here and avoid the buffer but that would be more complex)
+ // Avoid reallocations by writing to a reused buffer (note we could be even
+ // more efficient by creating the view directly here and avoid the buffer
+ // but that would be more complex)
let mut buffer = String::new();
- for (left, right) in left.iter().zip(right.iter()) {
- if let (Some(left), Some(right)) = (left, right) {
- use std::fmt::Write;
+ // Pre-compute combined null bitmap, so the per-row NULL check is more
+ // efficient
+ let nulls = NullBuffer::union(left.nulls(), right.nulls());
+
+ for i in 0..left.len() {
+ if nulls.as_ref().is_some_and(|n| n.is_null(i)) {
Review Comment:
We could hoist this outside the loop, although
1. This is a very common pattern in the code base
2. I'd think the branch predictor should be able to handle this very
effectively
3. We'd need to duplicate the loop body to handle the two cases, no?
I'd say the current approach is okay but lmk if you disagree.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]