tlm365 commented on code in PR #14025:
URL: https://github.com/apache/datafusion/pull/14025#discussion_r1905010563
##########
datafusion/functions/src/unicode/reverse.rs:
##########
@@ -116,14 +115,23 @@ pub fn reverse<T: OffsetSizeTrait>(args: &[ArrayRef]) ->
Result<ArrayRef> {
}
}
-fn reverse_impl<'a, T: OffsetSizeTrait, V: ArrayAccessor<Item = &'a str>>(
+fn reverse_impl<'a, T: OffsetSizeTrait, V: StringArrayType<'a>>(
string_array: V,
) -> Result<ArrayRef> {
- let result = ArrayIter::new(string_array)
- .map(|string| string.map(|string: &str|
string.chars().rev().collect::<String>()))
- .collect::<GenericStringArray<T>>();
+ let mut builder: GenericStringBuilder<T> =
+ GenericStringBuilder::with_capacity(string_array.len(), 1024);
Review Comment:
@2010YOUY01 Thanks for reviewing,
> I think we can use the actual data size here for pre-allocation, instead
of a constant 1024, the complexity of adding another argument for array size
seems reasonable
I agree that it would be better if we could pre-allocate the actual data
size here, but I think it's difficult to compute accurately - it depends on
context. Keeping it simple here seems reasonable as well.
Currently `GenericStringBuilder` have `new` and `with_capacity` to init new
builder, and 1024 is default size if we using `GenericStringBuilder::new`
([ref](https://github.com/apache/arrow-rs/blob/4f1f6e57c568fae8233ab9da7d7c7acdaea4112a/arrow-array/src/builder/generic_bytes_builder.rs#L39-L41))
that's why I choose 1024 here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]