[GitHub] [lucene] msokolov commented on pull request #235: LUCENE-9614: add KnnVectorQuery implementation

GitBox Fri, 13 Aug 2021 05:14:24 -0700


msokolov commented on pull request #235:
URL: https://github.com/apache/lucene/pull/235#issuecomment-898416859



   I ran a luceneutil test comparing the KnnQuery implementation we have there, 
which is implemented in createWeight rather than in rewrite and saw no 
difference. It's a little bit bogus as a comparison perhaps, but it's the best 
we have right now, and at least it proves we didn't do anything more boneheaded 
than before.
   
   ```
                       TaskQPS baseline      StdDevQPS candidate      StdDev    
            Pct diff p-value
              LowTermVector      685.43      (6.9%)      664.26      (7.1%)   
-3.1% ( -15% -   11%) 0.162
           AndHighMedVector      656.70      (6.1%)      646.58      (2.9%)   
-1.5% (  -9% -    7%) 0.308
             HighTermVector      703.33     (11.8%)      704.22      (5.9%)    
0.1% ( -15% -   20%) 0.966
           AndHighLowVector      667.55      (7.6%)      669.66      (5.2%)    
0.3% ( -11% -   14%) 0.878
                   PKLookup      187.46      (1.2%)      188.41      (0.6%)    
0.5% (  -1% -    2%) 0.100
              MedTermVector      636.03      (5.9%)      645.88      (5.0%)    
1.5% (  -8% -   13%) 0.371
          AndHighHighVector      642.66      (5.8%)      669.80      (6.3%)    
4.2% (  -7% -   17%) 0.027
   ```
   
   By the way I also did try the pro-rating idea I had posted earlier, with 
mixed results - it consistently made HighTermVector better and MedTermVector 
worse (quite a bit like 15% less QPS), which really surprised me. But perhaps 
having a tiny PQ (top K = 1 say) would make the graph exploration quite a bit 
less efficient? It's also possible this index is skewed and the query is having 
to re-run a bunch of times ... Needs further investigation.
   
   Finally, I think this is ready to push. I'll push later today if there are 
no new issues raised.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] msokolov commented on pull request #235: LUCENE-9614: add KnnVectorQuery implementation

Reply via email to