atris opened a new pull request #8237:
URL: https://github.com/apache/pinot/pull/8237
Native text engine minimises the query automaton post construction using
Hopcroft's algorithm. This can get expensive for large query automatons, and
does not yield much improvement anyways since the query automaton is build once
and use once.
Post this change, performance numbers using
BenchmarkNativeAndLuceneBasedLike:
Benchmark (_fstType) (_intBaseValue)
(_numBlocks) (_numRows)
(_query) Mode Cnt Score Error Units
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
0 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 40.436 ± 8.662 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
0 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 50.320 ± 4.254 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
1 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 42.378 ± 2.669 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
1 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 53.890 ± 2.951 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
10 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 47.751 ± 1.149 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
10 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 60.890 ± 1.949 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
100 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 93.937 ± 8.493 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
100 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 129.687 ± 16.903 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
0 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 55.362 ± 10.320 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
0 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 16.610 ± 1.297 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
1 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 54.800 ± 1.501 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
1 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 18.417 ± 0.696 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
10 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 60.187 ± 3.858 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
10 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 25.549 ± 1.694 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
100 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES
LIKE '%domain%' avgt 5 106.765 ± 13.996 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
100 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 99.888 ± 1.029 us/op
Note that for generic match queries '%domain%', Lucene and Native FST are at
parity from 0 blocks to 100 blocks. For prefix queries, Native FST is 4x faster
on 0 and 10 blocks, and 33% faster on 100 blocks.
```
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
0 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 50.320 ± 4.254 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
0 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 16.610 ± 1.297 us/op
BenchmarkNativeAndLuceneBasedLike.query LUCENE 1000
100 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 129.687 ± 16.903 us/op
BenchmarkNativeAndLuceneBasedLike.query NATIVE 1000
100 2500000 SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE
'www.domain%' avgt 5 99.888 ± 1.029 us/op
```
This behaviour was observed over multiple runs of the benchmark. Detailed
results at:
https://docs.google.com/document/d/1Jd-Oe0F9gx9WAB1sa5YdW7KZ_EsPOJdcHK1bsON9JHM/edit?usp=sharing
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]