WinkerDu commented on code in PR #2183:
URL: https://github.com/apache/arrow-datafusion/pull/2183#discussion_r846830459
##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -430,17 +431,17 @@ fn string_concat(left: ArrayRef, right: ArrayRef) ->
Result<ArrayRef> {
scalar_value => scalar_value.into_array(left.clone().len()),
};
let ignore_null_array =
ignore_null.as_any().downcast_ref::<StringArray>().unwrap();
- let result = (0..ignore_null_array.len())
+ let index_array = (0..ignore_null_array.len())
.into_iter()
.map(|index| {
if left.is_null(index) || right.is_null(index) {
None
} else {
- Some(ignore_null_array.value(index))
+ Some(index as u32)
}
})
- .collect::<StringArray>();
-
+ .collect::<UInt32Array>();
+ let result = take(ignore_null_array, &index_array, None)?;
Review Comment:
@alamb thanks for the advise.
I think we can optimize the whole string concat process to avoid this value
copying, something like:
- original process: build-in `concat` ignoring `NULL` -> generate validity
array or array of indexes -> take valid value from `concat` output array
according to bitmap or indexes
- opmized process: concat two input string array well handled with `NULL`,
no value copy any more.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]