alamb commented on code in PR #9732:
URL: https://github.com/apache/arrow-datafusion/pull/9732#discussion_r1535993449
##########
datafusion/physical-expr/src/string_expressions.rs:
##########
@@ -227,6 +229,132 @@ pub fn concat(args: &[ColumnarValue]) ->
Result<ColumnarValue> {
}
}
+enum ColumnarValueRef<'a> {
+ Scalar(&'a [u8]),
+ Array(&'a StringArray),
+}
+
+impl<'a> ColumnarValueRef<'a> {
+ #[inline]
+ fn is_valid(&self, i: usize) -> bool {
+ match &self {
+ Self::Scalar(_) => true,
+ Self::Array(array) => array.is_valid(i),
+ }
+ }
+
+ #[inline]
+ fn nulls(&self) -> Option<NullBuffer> {
+ match &self {
+ Self::Scalar(_) => None,
+ Self::Array(array) => array.nulls().map(|b| b.clone()),
+ }
+ }
+}
+
+struct StringArrayBuilder {
Review Comment:
I think some comments that explained how this was different than
https://docs.rs/arrow/latest/arrow/array/type.StringBuilder.html would help.
Maybe simply a note that it didn't check UTF8 again?
I wonder if we could get the same effect by adding an `unsafe` function to
`StringBuilder`, like
```rust
/// Adds bytes to the in progress string, without checking for valid utf8
///
/// Safety: requires that bytes are valid utf8, otherwise an invalid
StringArray will result
unsafe fn append_unchecked(&mut self, bytes: &[u8])
```
And then using `StringBuilder` here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]