msokolov commented on code in PR #12413: URL: https://github.com/apache/lucene/pull/12413#discussion_r1253495097
########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -256,6 +256,72 @@ public NeighborQueue searchLevel( return results; } + /** + * Function to find the best entry point from which to search the zeroth graph layer. + * + * @param query vector query with which to search + * @param vectors random access vector values + * @param graph the HNSWGraph + * @param visitLimit How many vectors are allowed to be visited + * @return An integer array whose first element is the best entry point, and second is the number + * of candidates visited. Entry point of `-1` indicates visitation limit exceed + * @throws IOException When accessing the vector fails + */ + private int[] findBestEntryPoint( + T query, RandomAccessVectorValues<T> vectors, HnswGraph graph, int visitLimit) + throws IOException { + int size = graph.size(); + int visitedCount = 1; + prepareScratchState(vectors.size()); + final NeighborQueue results = new NeighborQueue(1, false); + int currentEp = graph.entryNode(); + float currentScore = compare(query, vectors, currentEp); + float minAcceptedSimilarity = currentScore; + results.add(currentEp, currentScore); + for (int level = graph.numLevels() - 1; level >= 1; level--) { + candidates.add(currentEp, currentScore); + visited.set(currentEp); + // Keep searching the given level until we stop finding a better candidate entry point + while (candidates.size() > 0) { + // get the best candidate (closest or best scoring) + float topCandidateSimilarity = candidates.topScore(); + if (topCandidateSimilarity < minAcceptedSimilarity) { + break; + } + + int topCandidateNode = candidates.pop(); + graphSeek(graph, level, topCandidateNode); + int friendOrd; + while ((friendOrd = graphNextNeighbor(graph)) != NO_MORE_DOCS) { + assert friendOrd < size : "friendOrd=" + friendOrd + "; size=" + size; + if (visited.getAndSet(friendOrd)) { + continue; + } + if (visitedCount >= visitLimit) { + return new int[] {-1, visitedCount}; + } + float friendSimilarity = compare(query, vectors, friendOrd); + visitedCount++; + if (friendSimilarity >= minAcceptedSimilarity) { + candidates.add(friendOrd, friendSimilarity); + if (results.insertWithOverflow(friendOrd, friendSimilarity) && results.size() >= 1) { Review Comment: it seems a little odd to preserve the ceremony of adding to a priority queue that will always be of length 1, although I suppose this preserves the idea of length > 1? Maybe we would want to do that? ########## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphSearcher.java: ########## @@ -204,26 +204,26 @@ private static <T> NeighborQueue search( if (initialEp == -1) { return new NeighborQueue(1, true); } - NeighborQueue results; - results = new NeighborQueue(1, false); - int[] eps = new int[] {graph.entryNode()}; - int numVisited = 0; - for (int level = graph.numLevels() - 1; level >= 1; level--) { - results.clear(); - graphSearcher.searchLevel(results, query, 1, level, eps, vectors, graph, null, visitedLimit); - - numVisited += results.visitedCount(); - visitedLimit -= results.visitedCount(); - - if (results.incomplete()) { - results.setVisitedCount(numVisited); Review Comment: Wouldn't it be simpler to discard results here? There will never be more than one, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org