jtibshirani commented on issue #1314: LUCENE-9136: Coarse quantization that reuses existing formats. URL: https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326 **Benchmarks** In these benchmarks, we find the nearest k=10 vectors and record the recall and queries per second. For the number of centroids, we use the heuristic num centroids = sqrt(dataset size). sift-128-euclidean: a dataset of 1 million SIFT descriptors with 128 dims. ``` APPROACH RECALL QPS LuceneExact() 1.000 6.425 LuceneCluster(n_probes=5) 0.756 604.133 LuceneCluster(n_probes=10) 0.874 323.791 LuceneCluster(n_probes=20) 0.951 166.580 LuceneCluster(n_probes=50) 0.993 68.465 LuceneCluster(n_probes=100) 0.999 35.139 ``` glove-100-angular: a dataset of ~1.2 million GloVe word vectors of 100 dims. ``` APPROACH RECALL QPS LuceneExact() 1.000 6.764 LuceneCluster(n_probes=5) 0.681 642.247 LuceneCluster(n_probes=10) 0.768 343.067 LuceneCluster(n_probes=20) 0.836 177.037 LuceneCluster(n_probes=50) 0.908 73.256 LuceneCluster(n_probes=100) 0.951 37.302 ``` These benchmarks were performed using the [ann-benchmarks repo](https://github.com/erikbern/ann-benchmarks). The branch that I use to perform benchmarking
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org