adamjq commented on code in PR #4532:
URL: https://github.com/apache/solr/pull/4532#discussion_r3494011849
##########
solr/solr-ref-guide/modules/query-guide/pages/dense-vector-search.adoc:
##########
@@ -814,7 +814,37 @@ Some use cases where `includeTags` and/or `excludeTags`
may be more useful then
-=== Usage in Re-Ranking Query
+[[vector-reranking]]
+== Usage in Re-Ranking Query
+
+Dense vector similarity scores can be used to
xref:query-guide:query-re-ranking.adoc[re-rank] first pass query results.
+Possible use cases include:
+
+* Re-ranking approximate results from a quantized vector field using full
fidelity float vectors.
+* Re-ranking lexical search results with dense vector similarity scores.
+
+Details about using the ReRank Query Parser can be found in the
xref:query-guide:query-re-ranking.adoc[Query Re-Ranking] section.
+
+=== Re-Ranking with vectorSimilarity Function Query
+
+The
xref:query-guide:function-queries.adoc#vectorsimilarity-function[vectorSimilarity()]
function can be used with the `{!func}` query parser to re-rank by vector
similarity.
+When used as a function query, `vectorSimilarity()` computes the exact
similarity for only the candidate documents selected for re-ranking, without
traversing the index graph.
+
+Here is an example of re-ranking a lexical query using a `DenseVectorField`
named `vector`:
+
+[source,text]
+?q=title:phone&rq={!rerank reRankQuery=$rqq reRankDocs=100
reRankWeight=1}&rqq={!func}vectorSimilarity(vector,[1.0,2.0,3.0,4.0])
+
+NOTE: The default `reRankOperator` is `add`, which sums the first-pass score
and the vector similarity score.
+Since these scores may differ in magnitude, you can adjust `reRankWeight` to
control the balance between them, or use `reRankOperator=replace` to score
re-ranked documents by vector similarity alone.
+
+When using a quantized vector field type (such as
`ScalarQuantizedDenseVectorField`), the KNN first pass scores are computed on
the quantized vectors.
+Here is an example of re-ranking those results with exact float similarity
scores, where `topK` matches `reRankDocs`:
+
+[source,text]
+?q={!knn f=vector topK=100}[1.0,2.0,3.0,4.0]&rq={!rerank reRankQuery=$rqq
reRankDocs=100 reRankWeight=1
reRankOperator=replace}&rqq={!func}vectorSimilarity(vector,[1.0,2.0,3.0,4.0])
Review Comment:
It's interesting that this was flagged by Copilot, because it's a source of
confusion in the existing documentation and I believe the claim made is
incorrect. This would be true if the field used BYTE encoding (meaning int8
vectors are externally supplied), but not if Solr is quantizing vectors.
`ScalarQuantizedDenseVectorField` builds the HNSW graphs using the scalar
quantized representations of embeddings. The original float embeddings remain
accessible via `FloatVectorValues` and the full precision vectors are used for
re-quantization during segment merges.
The code path for `{!func}vectorSimilarity` is:
1.
[VectorSimilaritySourceParser](https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/search/VectorSimilaritySourceParser.java#L108)
- calls `getValueSource`
2.
[DenseVectorField.getValueSource](https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/schema/DenseVectorField.java#L487)
returns `new FloatKnnVectorFieldSource(field.getName())`
3. [(Lucene)
FloatKnnVectorFieldSource](https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/FloatKnnVectorFieldSource.java#L46)
- reads `getFloatVectorValues(fieldName);`
I was able to prove this locally via unit tests that do the following:
- Test 1: KNN vs KNN + rerank on DenseVectorField. Scores are identical as
both paths use exact floats.
- Test 2: KNN vs KNN + rerank on ScalarQuantizedDenseVectorField. Scores
differ as KNN uses quantized values and the rerank uses exact floats.
I can add the unit tests to the PR if that would be useful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]