lyne7-sc commented on code in PR #19547:
URL: https://github.com/apache/datafusion/pull/19547#discussion_r2658734085


##########
datafusion/functions/src/string/concat.rs:
##########
@@ -206,7 +207,11 @@ impl ScalarUDFImpl for ConcatFunc {
                         DataType::Utf8View => {
                             let string_array = as_string_view_array(array)?;
 
-                            data_size += string_array.len();
+                            data_size += string_array
+                                .data_buffers()
+                                .iter()
+                                .map(|buf| buf.len())
+                                .sum::<usize>();

Review Comment:
   Additionally, I noticed a similar pre-estimation logic in `concat_ws` using 
`data_buffers`.
   
   Given the results for `concat`, I suspect `concat_ws` might also suffer from 
iteration overhead and overestimation.
   
   What do you think about removing this logic from `concat_ws` as well to keep 
the implementation consistent? I'm happy to add more targeted test cases for 
`StringViewArray` in `concat_ws` to ensure we handle these scenarios correctly 
and efficiently.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to