neilconway opened a new pull request, #21519:
URL: https://github.com/apache/datafusion/pull/21519
## Which issue does this PR close?
- Closes #21518.
## Rationale for this change
Similar to other recent changes, `substr` currently checks for NULLs and
builds the result NULL bitmap on a per-row basis. It is faster to instead
compute the result NULL bitmap in bulk via bitwise AND.
Benchmarks (ARM64):
```
- substr, no count, short strings/substr_large_string [size=1024]: 21.4µs
→ 20.9µs (-2.3%)
- substr, no count, short strings/substr_large_string [size=4096]: 83.1µs
→ 83.0µs (-0.1%)
- substr, no count, short strings/substr_string [size=1024]: 20.5µs →
19.8µs (-3.4%)
- substr, no count, short strings/substr_string [size=4096]: 78.8µs →
77.0µs (-2.3%)
- substr, no count, short strings/substr_string_view [size=1024]: 18.9µs →
16.1µs (-14.8%)
- substr, no count, short strings/substr_string_view [size=4096]: 74.0µs →
61.6µs (-16.8%)
- substr, scalar args, long strings/substr_large_string [size=1024]:
35.2µs → 34.0µs (-3.4%)
- substr, scalar args, long strings/substr_large_string [size=4096]:
140.6µs → 134.5µs (-4.3%)
- substr, scalar args, long strings/substr_string [size=1024]: 35.5µs →
33.8µs (-4.8%)
- substr, scalar args, long strings/substr_string [size=4096]: 138.9µs →
134.2µs (-3.4%)
- substr, scalar args, long strings/substr_string_view [size=1024]: 34.0µs
→ 31.0µs (-8.8%)
- substr, scalar args, long strings/substr_string_view [size=4096]:
132.0µs → 121.8µs (-7.7%)
- substr, scalar args, short strings/substr_string [size=1024]: 31.0µs →
29.2µs (-5.8%)
- substr, scalar args, short strings/substr_string [size=4096]: 120.8µs →
111.5µs (-7.7%)
- substr, scalar args, short strings/substr_string_view [size=1024]:
26.8µs → 23.1µs (-13.8%)
- substr, scalar args, short strings/substr_string_view [size=4096]:
101.6µs → 86.4µs (-14.9%)
- substr, scalar start, no count, long strings/substr_string [size=1024]:
34.5µs → 33.2µs (-3.8%)
- substr, scalar start, no count, long strings/substr_string [size=4096]:
134.4µs → 133.6µs (-0.6%)
- substr, scalar start, no count, long strings/substr_string_view
[size=1024]: 32.9µs → 29.4µs (-10.6%)
- substr, scalar start, no count, long strings/substr_string_view
[size=4096]: 126.1µs → 115.2µs (-8.6%)
- substr, scalar start, no count, short strings/substr_string [size=1024]:
20.9µs → 20.1µs (-3.8%)
- substr, scalar start, no count, short strings/substr_string [size=4096]:
80.1µs → 77.5µs (-3.2%)
- substr, scalar start, no count, short strings/substr_string_view
[size=1024]: 19.9µs → 16.7µs (-16.1%)
- substr, scalar start, no count, short strings/substr_string_view
[size=4096]: 74.4µs → 62.4µs (-16.1%)
- substr, short count, long strings/substr_large_string [size=1024]:
30.3µs → 28.4µs (-6.3%)
- substr, short count, long strings/substr_large_string [size=4096]:
117.1µs → 112.0µs (-4.4%)
- substr, short count, long strings/substr_string [size=1024]: 30.2µs →
28.3µs (-6.3%)
- substr, short count, long strings/substr_string [size=4096]: 118.0µs →
111.0µs (-5.9%)
- substr, short count, long strings/substr_string_view [size=1024]: 26.1µs
→ 22.8µs (-12.6%)
- substr, short count, long strings/substr_string_view [size=4096]:
101.5µs → 87.7µs (-13.6%)
- substr, with count, long strings/substr_large_string [size=1024]: 34.6µs
→ 32.8µs (-5.2%)
- substr, with count, long strings/substr_large_string [size=4096]:
136.7µs → 133.0µs (-2.7%)
- substr, with count, long strings/substr_string [size=1024]: 34.2µs →
32.7µs (-4.4%)
- substr, with count, long strings/substr_string [size=4096]: 136.6µs →
132.3µs (-3.1%)
- substr, with count, long strings/substr_string_view [size=1024]: 33.3µs
→ 30.3µs (-9.0%)
- substr, with count, long strings/substr_string_view [size=4096]: 129.1µs
→ 119.6µs (-7.4%)
```
## What changes are included in this PR?
* Implement optimization
* Rename `make_and_append_view` to `append_view`, and have callers deal with
NULL handling; making it part of `append_view` encourages per-row NULL
computations, which should be avoided when possible.
* Mark `append_view` as never-inline; this avoids a performance regression
on some of the `substr` microbenchmarks, where LLVM is a little eager to inline
a large-ish function into a hot loop.
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]