mccullocht commented on issue #15154:
URL: https://github.com/apache/lucene/issues/15154#issuecomment-3259856671

   I think the reason bp reordering makes things faster in general is that the 
grouping reduces the effect of memory latency -- you're much more likely to 
score the query against two vectors that are laid out contiguously or nearly 
contiguously in memory or storage. This could have a similar effect if the data 
is on disk, depending on the size of the vectors relative to page size -- if 
you can fit dozens of vectors on a page (either because the vectors are small 
or the pages are large) then you can definitely get this effect. Latency might 
still be pretty poor because you will have to issue O(efSearch) reads in 
sequence in the best case but it would probably behave _way_ better than the 
status quo if, say, only 50% of the index fit in memory. If you are using 
binary vectors, hugepages, or storing in some other blocked format on disk this 
could be very effective.
   
   Note that you could probably do bp reordering within the codec itself 
without forcing reordering of the docid space, but you would have to store the 
doc <-> ord mapping in at least one direction to serve queries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to