bziobrowski opened a new pull request, #14833:
URL: https://github.com/apache/pinot/pull/14833
PR optimizes a number of string scalar functions, including:
- ltrim
- rtrim
- unique_ngrams
- concat
- concat_ws
and adds new version of other functions optimized assuming that pattern is
constant :
- regexp_replace_const
- regexp_like_const
- regexp_extract_const
- replace _const
- like_const
All of the functions mentioned above have been changed to initialize
temporary objects and clear/reuse them in each call.
As can be seen in the following benchmark output, this change can speed up a
raw function call even 4+ times.
```
Benchmark (_regex) Mode Cnt
Score Error Units
BenchmarkRegexpReplace.testRegexpReplaceConst q.[aeiou]c.* avgt 3
25.720 ± 0.262 us/op
BenchmarkRegexpReplace.testRegexpReplaceConst .*a avgt 3
92.530 ± 2.315 us/op
BenchmarkRegexpReplace.testRegexpReplaceConst b.* avgt 3
34.444 ± 3.076 us/op
BenchmarkRegexpReplace.testRegexpReplaceConst .* avgt 3
42.251 ± 1.791 us/op
BenchmarkRegexpReplace.testRegexpReplaceConst .*ated avgt 3
121.553 ± 1.767 us/op
BenchmarkRegexpReplace.testRegexpReplaceConst .*ba.* avgt 3
130.567 ± 1.258 us/op
BenchmarkRegexpReplace.testRegexpReplaceOld q.[aeiou]c.* avgt 3
101.532 ± 10.586 us/op
BenchmarkRegexpReplace.testRegexpReplaceOld .*a avgt 3
153.493 ± 8.621 us/op
BenchmarkRegexpReplace.testRegexpReplaceOld b.* avgt 3
75.913 ± 2.909 us/op
BenchmarkRegexpReplace.testRegexpReplaceOld .* avgt 3
75.989 ± 4.248 us/op
BenchmarkRegexpReplace.testRegexpReplaceOld .*ated avgt 3
214.719 ± 91.627 us/op
BenchmarkRegexpReplace.testRegexpReplaceOld .*ba.* avgt 3
212.798 ± 5.929 us/op
```
If query processing is dominated by function call then effect on actual
query performance is similar:
```
Benchmark (_numRows)
(_query) (_scenario) Mode Cnt Score Error Units
BenchmarkQueriesMSQE.query 1500000 select * from
(
select RAW_STRING_COL
from MyTable
limit 100000
)
where regexp_like_const('.*a.*', RAW_STRING_COL ) EXP(0.001) avgt 5
12.351 ± 1.591 ms/op
BenchmarkQueriesMSQE.query 1500000 select * from
(
select RAW_STRING_COL
from MyTable
limit 100000
)
where regexp_like('.*a.*', RAW_STRING_COL ) EXP(0.001) avgt 5 42.298
± 3.225 ms/op
```
NOTE: the reason I added _const function is that currently there is no way
for engine to choose implementation based on function argument being constant
or variable. If we change, e.g. regexp_replace, it will start returning wrong
results if regular expression is variable, without raising an error or warning.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]