lyne7-sc commented on code in PR #19547:
URL: https://github.com/apache/datafusion/pull/19547#discussion_r2658728261
##########
datafusion/functions/src/string/concat.rs:
##########
@@ -206,7 +207,11 @@ impl ScalarUDFImpl for ConcatFunc {
DataType::Utf8View => {
let string_array = as_string_view_array(array)?;
- data_size += string_array.len();
+ data_size += string_array
+ .data_buffers()
+ .iter()
+ .map(|buf| buf.len())
+ .sum::<usize>();
Review Comment:
> It's a hard tradeoff to be sure as now we have to iterate the whole array 🤔
>
> I would be curious to see what the benchmarks say; I'm not too sure on
this myself, would love it if there were an easy way to estimate view size 😅
Hi @Jefffrey, I ran a benchmark comparing the pre-estimation logic
(iterating buffers/views) against the current implementation.
```
group concat_main_branch
concat_perf_data_buffers concat_perf_views_iter
----- ------------------
------------------------ ----------------------
concat function/concat_view/1024 1.00 109.7±3.28µs ? ?/sec
1.14 125.3±3.71µs ? ?/sec 1.04 113.6±4.48µs ? ?/sec
concat function/concat_view/4096 1.02 608.1±20.13µs ? ?/sec
1.00 595.9±15.17µs ? ?/sec 1.01 603.1±19.92µs ? ?/sec
concat function/concat_view/8192 1.00 1206.7±52.12µs ? ?/sec
1.06 1281.5±44.05µs ? ?/sec 1.06 1277.6±37.06µs ? ?/sec
```
The results showed that the overhead of iteration actually outweighed the
allocation savings, leading to a slight regression in some cases (see the
benchmark table above). Given StringView's design, the default growth strategy
seems more efficient here. So I’ve reverted that part.
We might want to revisit this optimization later if we have a more
efficient way to determine the total data view size.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]