Re: [I] Add IO prefetch to HNSW graph crawl? [lucene]

via GitHub Mon, 19 Jan 2026 05:15:15 -0800


benwtrent commented on issue #15286:
URL: https://github.com/apache/lucene/issues/15286#issuecomment-3768284297


   Prefetching during graph exploration, when everything fits in memory (e.g. 
the designed path), gets significant perf impact. At least, it does for MMAP. 
This is likely due to a combination of the cost of the MMAP prefetch code and 
an individual prefetch call for every neighbor vector (32+ calls per vector op, 
this adds up, especially for binary vectors where the vector ops themselves are 
already cheaper than just exploring the graph).
   
   It would be awesome if there was a smarter way to prefetch for vectors (e.g. 
we know if it was a miss/hit, keep track and eventually stop prefetching).
   
   I get that prefetching helps in:
   
   - cold case (e.g. files not yet fully mapped)
   - not enough memory case
   
   If we can help those cases without negatively impacting the designed case 
for HNSW (all in memory), then I am all for it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Add IO prefetch to HNSW graph crawl? [lucene]

Reply via email to