aruggero commented on code in PR #16034:
URL: https://github.com/apache/lucene/pull/16034#discussion_r3202353225


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/AbstractHnswGraphSearcher.java:
##########
@@ -96,14 +96,89 @@ protected static void scoreEntryPoints(
     assert scores != null && scores.length >= eps.length;
     scorer.bulkScore(eps, scores, eps.length);
     results.incVisitedCount(eps.length);
+    float[] siblingScores = null;
+    int[] siblingsOrd = new int[0];
     for (int i = 0; i < eps.length; i++) {
       float score = scores[i];
       int ep = eps[i];
       visited.set(ep);
       candidates.add(ep, score);
       if (acceptOrds == null || acceptOrds.get(ep)) {
+        // Fetch siblingsOrd BEFORE collect() so the parent is not yet in the 
heap
+        int numSiblingsToVisit = 0;
+        // The instanceof check is needed: this method is also called with a 
GraphBuilderKnnCollector
+        if (results instanceof OrdinalTranslatedKnnCollector collector) {
+          if (collector.isSiblingExpansionCollector()) {
+            siblingsOrd = collector.getSiblingOrdinals(ep, visited, 
siblingsOrd);
+            if (siblingsOrd.length > 0) {
+              //  how many siblingsOrd are actually scored to avoid exceeding 
the visit budget.
+              //  controls the early termination condition. early terminates 
the search if we reach
+              //  visitLimit nodes
+              //  if this visit limit is high we just navigate the graph until 
we do not have any node
+              //  with a score higher than the ones already collected
+              //  Current values it could assume:
+              //  -  No filter → Integer.MAX_VALUE (no constraint tighter than 
the full segment)
+              //  -  With filter → cardinality (no constraint tighter than the 
full accepted set)
+              numSiblingsToVisit =
+                  (int) Math.min(siblingsOrd.length, results.visitLimit() - 
results.visitedCount());

Review Comment:
   Hi @benwtrent ,
   In your opinion, does it make sense to you to cut off the number of siblings 
to explore, looking at the `results.visitLimit()`?
   From what @alessandrobenedetti and I have seen, this limit seems to be set 
to:
   -  No filter → Integer.MAX_VALUE (no constraint tighter than the full 
segment)
   -  With filter → cardinality (no constraint tighter than the full accepted 
set)
   As described in the comment above...
   
   What does this represent? It doesn't seem to be set as a query parameter.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to