mccullocht commented on issue #15154: URL: https://github.com/apache/lucene/issues/15154#issuecomment-3259856671
I think the reason bp reordering makes things faster in general is that the grouping reduces the effect of memory latency -- you're much more likely to score the query against two vectors that are laid out contiguously or nearly contiguously in memory or storage. This could have a similar effect if the data is on disk, depending on the size of the vectors relative to page size -- if you can fit dozens of vectors on a page (either because the vectors are small or the pages are large) then you can definitely get this effect. Latency might still be pretty poor because you will have to issue O(efSearch) reads in sequence in the best case but it would probably behave _way_ better than the status quo if, say, only 50% of the index fit in memory. If you are using binary vectors, hugepages, or storing in some other blocked format on disk this could be very effective. Note that you could probably do bp reordering within the codec itself without forcing reordering of the docid space, but you would have to store the doc <-> ord mapping in at least one direction to serve queries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org