neilconway opened a new issue, #21539:
URL: https://github.com/apache/datafusion/issues/21539
### Is your feature request related to a problem or challenge?
We have `StringViewArrayBuilder` and `StringArrayBuilder`, which are
optimized versions of corresponding string builders in Arrow. However, our
versions are only used in two places; we use the Arrow versions much more
often, partly because our versions have a very narrow API (you can only pass a
`ColumnValueRef`). That means we can't use our builder versions in situations
where the caller has transformed the value of the column, which is pretty
common, roughly in all these places:
```
| `string/common.rs` (case_conversion Utf8View path) | 359-372 |
`to_upper`/`to_lower` for Utf8View |
| `unicode/initcap.rs` | 166-172, 238-244 | Non-ASCII `initcap` for
Utf8/LargeUtf8 and Utf8View |
| `unicode/reverse.rs` | 135-153 | `reverse` for all string types |
| `unicode/translate.rs` | 225-263, 319-357 | `translate` both all-array
and scalar-optimized paths |
| `unicode/substrindex.rs` | 183-241 | `substr_index` all-array path |
| `string/replace.rs` | 166-181, 194-209 | `replace` for Utf8View and
generic string |
| `unicode/lpad.rs` | 237, 294, 454, 510 | Various `lpad` code paths |
| `unicode/rpad.rs` | 238, 296, 454, 511 | Various `rpad` code paths |
| `datetime/to_char.rs` | 209-210, 252-253 | `to_char` scalar and array
format paths |
| `string/repeat.rs` | 329 | `repeat` for generic string |
```
If we extended the API of our builders, we could use them in all/most of
these places, which would yield a nice perf win (e.g., because we'll be doing
the NULL computation in bulk, not per-row). We'd need to add something like
`append_value(&str)`, `write_str() / write_char() / finish_value()`, and
`append_empty()` (placeholder for NULLs).
### Describe the solution you'd like
_No response_
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]