benwtrent commented on code in PR #16034:
URL: https://github.com/apache/lucene/pull/16034#discussion_r3202491277


##########
lucene/core/src/java/org/apache/lucene/util/hnsw/AbstractHnswGraphSearcher.java:
##########
@@ -96,14 +96,89 @@ protected static void scoreEntryPoints(
     assert scores != null && scores.length >= eps.length;
     scorer.bulkScore(eps, scores, eps.length);
     results.incVisitedCount(eps.length);
+    float[] siblingScores = null;
+    int[] siblingsOrd = new int[0];
     for (int i = 0; i < eps.length; i++) {
       float score = scores[i];
       int ep = eps[i];
       visited.set(ep);
       candidates.add(ep, score);
       if (acceptOrds == null || acceptOrds.get(ep)) {
+        // Fetch siblingsOrd BEFORE collect() so the parent is not yet in the 
heap
+        int numSiblingsToVisit = 0;
+        // The instanceof check is needed: this method is also called with a 
GraphBuilderKnnCollector
+        if (results instanceof OrdinalTranslatedKnnCollector collector) {
+          if (collector.isSiblingExpansionCollector()) {
+            siblingsOrd = collector.getSiblingOrdinals(ep, visited, 
siblingsOrd);
+            if (siblingsOrd.length > 0) {
+              //  how many siblingsOrd are actually scored to avoid exceeding 
the visit budget.
+              //  controls the early termination condition. early terminates 
the search if we reach
+              //  visitLimit nodes
+              //  if this visit limit is high we just navigate the graph until 
we do not have any node
+              //  with a score higher than the ones already collected
+              //  Current values it could assume:
+              //  -  No filter → Integer.MAX_VALUE (no constraint tighter than 
the full segment)
+              //  -  With filter → cardinality (no constraint tighter than the 
full accepted set)
+              numSiblingsToVisit =
+                  (int) Math.min(siblingsOrd.length, results.visitLimit() - 
results.visitedCount());

Review Comment:
   `visitLimit` is a "best effort" termination of exploration in filtered 
scenarios when we explore too many nodes during filtered search and spent too 
much compute which should have just been done via brute-force. 
   
   It isn't directly set by anything other than filter cardinality, I am not 
sure its a useful parameter to pay attention to in this particular usecase.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to