neilconway commented on code in PR #20317:
URL: https://github.com/apache/datafusion/pull/20317#discussion_r2822606960


##########
datafusion/functions/src/string/concat.rs:
##########
@@ -207,7 +207,7 @@ impl ScalarUDFImpl for ConcatFunc {
                         DataType::Utf8View => {
                             let string_array = as_string_view_array(array)?;
 
-                            data_size += string_array.len();
+                            data_size += 
string_array.total_buffer_bytes_used();

Review Comment:
   Fair point.
   
   On your suggested fix: summing the entire 128 bit view seems wrong (we don't 
want to add the buffer indexes and offsets, etc.). I think it would work to 
just use the low 32 bits of each view, but I'm leery of depending on an Arrow 
implementation detail like that.
   
    We could iterate over the strings and sum their length, but that seems like 
overkill for a buffer sizing hint.
   
   How about:
   1. Use total_buffer_bytes_used() for now; this is just a capacity hint, so 
an estimate is okay
   2. Add a comment for the short-strings case
   3. Open an issue in Arrow to have them add a helper for this use-case, as it 
seems reasonably useful in general



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to