[GitHub] [lucene] benwtrent opened a new pull request, #12413: Fix HNSW graph visitation limit bug

via GitHub Mon, 03 Jul 2023 14:36:52 -0700


benwtrent opened a new pull request, #12413:
URL: https://github.com/apache/lucene/pull/12413


   We have some weird behavior in HNSW searcher when finding the candidate 
entry point for the zeroth layer. 
   
   While trying to find the best entry point to gather the full candidate sets, 
we don't filter based on the acceptableOrds bitset. Consequently, if we exist 
the search early (before hitting the zeroth layer), the results that are 
returned may contain documents NOT within that bitset. 
   
   Luckily since the results are marked as incomplete, the `*VectorQuery` logic 
switches back to an exact scan and throws away the results. 
   
   However, if any user called the leaf searcher directly, bypassing the query, 
they could run into this bug.
   
   I ran performance tests and there were no significant latency increases. 
There do seem to be observable latency decreases though at higher `maxConn` 
levels. 
   
   I am getting slightly different recall. Usually better by 0.001, but worse 
by `0.001` on glove 100 with
   ```
   nDoc fanout  maxConn beamWidth
   100000       20      96      500     120
   ``` 
   
   so I am digging into why that may be. Any help there is appreciated. 
   
   Data (lucene util knnPerf):
   ```
      dim = 100
       doc_vectors = constants.GLOVE_VECTOR_DOCS_FILE
       query_vectors = '%s/util/tasks/vector-task-100d.vec' % constants.BASE_DIR
   ```
   Settings ran:
   ```
   VALUES = {
       'ndoc': (100000,),
       'maxConn': (32, 96),
       'beamWidthIndex': (250, 500,),
       'fanout': (20, 100,),
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent opened a new pull request, #12413: Fix HNSW graph visitation limit bug

Reply via email to