Rachelint commented on issue #11281:
URL: https://github.com/apache/datafusion/issues/11281#issuecomment-2212319819
Strange results got from my poc...
After @efredine fixed the `filter_map` bug in #11295 , we can use
`StringArray::from_iter` to relace `collect + StringArray::from`.
And the `StringArray::from_iter` is actually impl like:
```Rust
let iter = iter.into_iter();
let mut builder =
GenericByteBuilder::with_capacity(iter.size_hint().0, 1024);
builder.extend(iter);
builder.finish()
```
It seems that In theory, `from_iter` is at least no worse than use the
`StringBuilder` directly in
[#https://github.com/apache/datafusion/pull/11136#discussion_r1657725214](https://github.com/apache/datafusion/pull/11136#discussion_r1657725214)...
How ever the bench results are:
- use `collect + from` as origin:
```
Extract data page statistics for String/extract_statistics/String
time: [71.441 µs 71.748 µs 72.038 µs]
change: [-0.3091% +0.1457% +0.6048%] (p = 0.54 >
0.05)
```
- use `from_iter`:
```
Extract data page statistics for String/extract_statistics/String
time: [41.303 µs 41.358 µs 41.415 µs]
```
- use `StringBuilder` directly:
```
Extract data page statistics for String/extract_statistics/String
time: [15.471 µs 15.489 µs 15.508 µs]
```
The diffent I found now is that the `StringBuilder::new` use default 1024 as
the capacity to init the buffer, but in `from_iter` side, use `size_hint()`
instead... Maybe it is related to that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]