msokolov commented on pull request #235:
URL: https://github.com/apache/lucene/pull/235#issuecomment-898416859
I ran a luceneutil test comparing the KnnQuery implementation we have there,
which is implemented in createWeight rather than in rewrite and saw no
difference. It's a little bit bogus as a comparison perhaps, but it's the best
we have right now, and at least it proves we didn't do anything more boneheaded
than before.
```
TaskQPS baseline StdDevQPS candidate StdDev
Pct diff p-value
LowTermVector 685.43 (6.9%) 664.26 (7.1%)
-3.1% ( -15% - 11%) 0.162
AndHighMedVector 656.70 (6.1%) 646.58 (2.9%)
-1.5% ( -9% - 7%) 0.308
HighTermVector 703.33 (11.8%) 704.22 (5.9%)
0.1% ( -15% - 20%) 0.966
AndHighLowVector 667.55 (7.6%) 669.66 (5.2%)
0.3% ( -11% - 14%) 0.878
PKLookup 187.46 (1.2%) 188.41 (0.6%)
0.5% ( -1% - 2%) 0.100
MedTermVector 636.03 (5.9%) 645.88 (5.0%)
1.5% ( -8% - 13%) 0.371
AndHighHighVector 642.66 (5.8%) 669.80 (6.3%)
4.2% ( -7% - 17%) 0.027
```
By the way I also did try the pro-rating idea I had posted earlier, with
mixed results - it consistently made HighTermVector better and MedTermVector
worse (quite a bit like 15% less QPS), which really surprised me. But perhaps
having a tiny PQ (top K = 1 say) would make the graph exploration quite a bit
less efficient? It's also possible this index is skewed and the query is having
to re-run a bunch of times ... Needs further investigation.
Finally, I think this is ready to push. I'll push later today if there are
no new issues raised.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]