msokolov commented on PR #15430: URL: https://github.com/apache/lucene/pull/15430#issuecomment-3596775130
OK after finally having ironed out the bugs, I have some results. The situation is a little complicated as the change here really doesn't help much with the typical "dense" index where every document has a vector. I think the reason is that any gains are masked by the additional cost of having a node->doc mapping that must be traversed. On the other hand, in the "sparse" case where some documents have no vectors, we already have such a mapping, so we can see the impact of this change more clearly. Net/net we see improvements in search latency, increasing with index size. On indexes of 1-2MM I see 5% improvement, on 10MM, a 10% improvement. As expected, `vex` files show a decrease in size (about 15%). There is also an increase in `vem` since that is where we store the new node->doc mapping, but this is pretty small. Merge times go up a lot - this metric varies quite a bit, but seems to be about 100% increase. It may be possible to reduce the merge times by tweaking the parameters of the BP execution to make it recurse less? I'll see if I can do that while retaining the latency improvements. Then it might be best to enable this only for sparse indexes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
