kaivalnp commented on PR #15784: URL: https://github.com/apache/lucene/pull/15784#issuecomment-4042688192
> being able to allow the searcher to explore the graph more to improve recall seems critical for any interface that interacts with HNSW Thanks @benwtrent -- having this knob makes sense to me. I made the decay factor configurable (as mentioned [here](https://github.com/apache/lucene/pull/15784#issuecomment-3999099527)). Graph traversal starts with a large buffer, and decays towards scores of nodes traversed but not collected, with the provided factor. Baseline ``` recall latency(ms) netCPU avgCpuCount traversalSimilarity resultSimilarity resultCount visited 0.987 3.033 3.032 1.000 0.74 0.8 19.361 8010 0.983 0.935 0.934 0.999 0.76 0.8 19.276 2621 0.973 0.390 0.389 0.998 0.78 0.8 19.081 962 0.947 0.190 0.189 0.995 0.8 0.8 18.566 517 ``` Candidate ``` recall latency(ms) netCPU avgCpuCount decay resultSimilarity resultCount visited 1.000 1.042 1.041 0.999 0.9 0.8 19.604 2812 0.997 0.459 0.458 0.998 0.8 0.8 19.555 1396 0.978 0.271 0.270 0.997 0.5 0.8 19.172 757 0.963 0.220 0.219 0.996 0.1 0.8 18.896 598 0.958 0.205 0.204 0.996 0.0 0.8 18.780 563 ``` This appears to have a better recall v/s latency tradeoff than traversing the graph based on a fixed `traversalSimilarity`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
