neilconway opened a new pull request, #21538:
URL: https://github.com/apache/datafusion/pull/21538
## Which issue does this PR close?
- Closes #21537.
## Rationale for this change
`StringViewArrayBuilder` is implemented on top of Arrow's
`StringViewBuilder`; the latter tracks NULLs incrementally. However, the
`StringViewArrayBuilder` requires callers to pass a NULL buffer to `finish()`
anyway, so the NULL bitmap that has been computed by `StringViewBuilder` is
discarded. It would be more efficient to stop using `StringViewBuilder` so that
we don't do this redundant work; in theory there might be room for
inconsistency between the two NULL bitmaps as well.
Right now, `StringViewArrayBuilder` is only used by the `concat` and
`concat_ws` UDFs, but I'd like to generalize the API and use it more broadly in
place of `StringViewBuilder`. For the time being, here are the results of this
PR on the `concat` benchmarks (Arm64):
```
- 1024 rows: 29.6 µs → 28.0 µs, -5.3%
- 4096 rows: 134.3 µs → 125.6 µs, -6.5%
- 8192 rows: 289.7 µs → 273.5 µs, -5.6%
```
## What changes are included in this PR?
* Stop using `StringViewBuilder` and build the views ourselves
* Improve some comments
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]