neilconway opened a new issue, #21539:
URL: https://github.com/apache/datafusion/issues/21539

   ### Is your feature request related to a problem or challenge?
   
   We have `StringViewArrayBuilder` and `StringArrayBuilder`, which are 
optimized versions of corresponding string builders in Arrow. However, our 
versions are only used in two places; we use the Arrow versions much more 
often, partly because our versions have a very narrow API (you can only pass a 
`ColumnValueRef`). That means we can't use our builder versions in situations 
where the caller has transformed the value of the column, which is pretty 
common, roughly in all these places:
   
   ```
     | `string/common.rs` (case_conversion Utf8View path) | 359-372 | 
`to_upper`/`to_lower` for Utf8View |
     | `unicode/initcap.rs` | 166-172, 238-244 | Non-ASCII `initcap` for 
Utf8/LargeUtf8 and Utf8View |
     | `unicode/reverse.rs` | 135-153 | `reverse` for all string types |
     | `unicode/translate.rs` | 225-263, 319-357 | `translate` both all-array 
and scalar-optimized paths |
     | `unicode/substrindex.rs` | 183-241 | `substr_index` all-array path |
     | `string/replace.rs` | 166-181, 194-209 | `replace` for Utf8View and 
generic string |
     | `unicode/lpad.rs` | 237, 294, 454, 510 | Various `lpad` code paths |
     | `unicode/rpad.rs` | 238, 296, 454, 511 | Various `rpad` code paths |
     | `datetime/to_char.rs` | 209-210, 252-253 | `to_char` scalar and array 
format paths |
     | `string/repeat.rs` | 329 | `repeat` for generic string |
   ```
   
   If we extended the API of our builders, we could use them in all/most of 
these places, which would yield a nice perf win (e.g., because we'll be doing 
the NULL computation in bulk, not per-row). We'd need to add something like 
`append_value(&str)`, `write_str() / write_char() / finish_value()`, and 
`append_empty()` (placeholder for NULLs).
   
   ### Describe the solution you'd like
   
   _No response_
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to