msokolov opened a new pull request #2022: URL: https://github.com/apache/lucene-solr/pull/2022
Phew this has been a long time coming, but I think it is in good shape now. We started with a scratchy prototype about a year ago, then @mocobeta got it on a better footing by adding a new codec and also implemented the full hierarchical algorithm, making the graph search faithful to the published literature. Then we took a step back to add underlying vector format as a separate patch, now landed. This patch builds on the new vector format, providing KNN search with NSW graphs. It's the simplest implementation I could tease out (single layer graph, simple neighbor selection, no max fanout control), but I think it will be a good foundation. I've done some pretty extensive performance testing and hyperparameter exploration using the (included) KnnGraphTester with some proprietary data, and get good results. I will follow up later with specifics, but single-threaded latencies in a few ms on my i7 laptop over a 1M x 256-dim dataset seems pretty good. Followups will include repeatable benchmarks on public datasets. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org