irvingzhang opened a new pull request #1295: Lucene-9004: bug fix for searching the nearest one neighbor in higher layers URL: https://github.com/apache/lucene-solr/pull/1295 `if (dist < f.distance() || results.size() < ef) { Neighbor n = new ImmutableNeighbor(e.docId(), dist); candidates.add(n); results.insertWithOverflow(n); f = results.top(); }` If (dist < f.distance()) but results.size() >= ef, the "Neighbor n" would be added to "results" ("results" is a sub-type of PriorityQueue). The actual size of "results" would be between "ef" and results' max queue size, while its expected size if "ef". Consider the following situation: `FurthestNeighbors neighbors = new FurthestNeighbors(ef, ep); for (int l = hnsw.topLevel(); l > 0; l--) { visitedCount += hnsw.searchLayer(query, neighbors, 1, l, vectorValues); } visitedCount += hnsw.searchLayer(query, neighbors, ef, 0, vectorValues);` where the max size of "neighbors" ("neighbors" is also a sub-type of PriorityQueue) is ef (assume ef > 1). When search over a non-zero layer, we are going to find the nearest one neighbor by `hnsw.searchLayer(query, neighbors, 1, l, vectorValues);`, where l is the layer and layer > 0. The actual size of "neighbors" may be larger than 1. Assume that "results.size() <= ef", I think "results.pop();" when "results.size() == ef" can solve this problem.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org