Hi Solr,
I've hit a performance issue combining {!knn} queries with group=true that
I'd like to file a ticket for and contribute a fix.
When grouping requests scores (score in fl, or sort=score desc),
Grouping.populateScoresIfNecessary() calls
TopFieldCollector.populateScores() once per group, and each call invokes
searcher.rewrite(query) on the original un-rewritten KNN query
(Grouping.populateScoresIfNecessary() must have been written assuming that
query rewriting is cheap). But since AbstractKnnVectorQuery.rewrite() is
where the full HNSW graph search executes, this means every group triggers
a complete traversal from scratch. Combined with the two-pass collection,
you get 2 + N full HNSW traversals per request (N = number of groups).
In our workload, this takes KNN P50 from 32ms ungrouped to 1,115ms grouped
(~35x).
The issue is present in 9.8 and unchanged in the 10.0 branch (Grouping.java
is the same logic).
The fix I have in mind is a one-liner at the top of the grouping pipeline:
query = searcher.rewrite(query);
For KNN, this returns a DocAndScoreQuery with cached results, so subsequent
rewrite() calls in populateScores() become no-ops. For non-KNN queries,
rewrite() is already cheap/idempotent, so should not cause issues. The
Grouping.query field is only used for populateScores() calls (the
first/second pass use a separate searchQuery local variable), so this
doesn't affect other code paths.
I didn't find an existing ticket for this. I'd like to file one against the
search component targeting 9.x and 10.x, and put up a patch.
This is my first post to dev@solr.
Thank you,
Philipp