atris opened a new pull request #8237:
URL: https://github.com/apache/pinot/pull/8237


   Native text engine minimises the query automaton post construction using 
Hopcroft's algorithm. This can get expensive for large query automatons, and 
does not yield much improvement anyways since the query automaton is build once 
and use once.
   
   Post this change, performance numbers using 
BenchmarkNativeAndLuceneBasedLike:
   
   Benchmark                                (_fstType)  (_intBaseValue)  
(_numBlocks)  (_numRows)                                                        
            (_query)  Mode  Cnt    Score    Error  Units
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
    0     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   40.436 ±  8.662  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
    0     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   50.320 ±  4.254  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
    1     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   42.378 ±  2.669  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
    1     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   53.890 ±  2.951  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
   10     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   47.751 ±  1.149  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
   10     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   60.890 ±  1.949  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
  100     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   93.937 ±  8.493  us/op
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
  100     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5  129.687 ± 16.903  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
    0     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   55.362 ± 10.320  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
    0     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   16.610 ±  1.297  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
    1     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   54.800 ±  1.501  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
    1     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   18.417 ±  0.696  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
   10     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5   60.187 ±  3.858  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
   10     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   25.549 ±  1.694  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
  100     2500000     SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES 
LIKE '%domain%'  avgt    5  106.765 ± 13.996  us/op
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
  100     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   99.888 ±  1.029  us/op
   
   Note that for generic match queries '%domain%', Lucene and Native FST are at 
parity from 0 blocks to 100 blocks. For prefix queries, Native FST is 4x faster 
on 0 and 10 blocks, and 33% faster on 100 blocks.
   
   
   ```
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
    0     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   50.320 ±  4.254  us/op
   
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
    0     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   16.610 ±  1.297  us/op
   
   BenchmarkNativeAndLuceneBasedLike.query      LUCENE             1000         
  100     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5  129.687 ± 16.903  us/op
   
   BenchmarkNativeAndLuceneBasedLike.query      NATIVE             1000         
  100     2500000  SELECT INT_COL, URL_COL FROM MyTable WHERE DOMAIN_NAMES LIKE 
'www.domain%'  avgt    5   99.888 ±  1.029  us/op
   ```
   
   This behaviour was observed over multiple runs of the benchmark. Detailed 
results at:
   
    
https://docs.google.com/document/d/1Jd-Oe0F9gx9WAB1sa5YdW7KZ_EsPOJdcHK1bsON9JHM/edit?usp=sharing
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to