vigyasharma commented on code in PR #14708: URL: https://github.com/apache/lucene/pull/14708#discussion_r2118819077
########## lucene/core/src/java/org/apache/lucene/search/ByteVectorSimilarityValuesSource.java: ########## @@ -42,7 +78,35 @@ public VectorScorer getScorer(LeafReaderContext ctx) throws IOException { ByteVectorValues.checkField(ctx.reader(), fieldName); return null; } - return vectorValues.scorer(queryVector); + final FieldInfo fi = ctx.reader().getFieldInfos().fieldInfo(fieldName); + if (fi.getVectorDimension() != queryVector.length) { + throw new IllegalArgumentException( + "Query vector dimension does not match field dimension: " + + queryVector.length + + " != " + + fi.getVectorDimension()); + } + + // default vector scorer + if (useFullPrecision == false) { + return vectorValues.scorer(queryVector); + } + + final VectorSimilarityFunction vectorSimilarityFunction = fi.getVectorSimilarityFunction(); + return new VectorScorer() { + final KnnVectorValues.DocIndexIterator iterator = vectorValues.iterator(); + + @Override + public float score() throws IOException { + return vectorSimilarityFunction.compare( + queryVector, vectorValues.vectorValue(iterator.index())); Review Comment: It's a valid concern for setups with limited memory. > As HNSW will suffer the performance if the vectors are not in RAM, I'm wondering if we can restrict the memory used by the re-ranking phase. Maybe.. I wonder how we decide that the pages used for HNSW search are more important than pages used for FP reranking. For an application which does KNN search and reranks via full precision vectors, a query doesn't really complete until both phases are done. Wouldn't thrashing queries during reranking add to overall query latency. Maybe this is okay if you were reranking for only a subset of queries, and the vast majority is still only HNSW search, but that seems very use-case specific. Might be best to let OS page cache handle this? Anyway, I think this deserves it's own discussion, perhaps in a separate issue? And as you and others have already mentioned, it can be handled independent of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org