[
https://issues.apache.org/jira/browse/LUCENE-9937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julie Tibshirani resolved LUCENE-9937.
--------------------------------------
Resolution: Done
> ann-benchmarks results for HNSW search
> --------------------------------------
>
> Key: LUCENE-9937
> URL: https://issues.apache.org/jira/browse/LUCENE-9937
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Julie Tibshirani
> Priority: Minor
>
> *NOTE: These benchmarks contained an error that made hnswlib perform too
> well. See the last comment for correct results.*
> I hooked up our HNSW implementation to
> [ann-benchmarks|https://github.com/erikbern/ann-benchmarks], a widely used
> repo for benchmarking nearest neighbor search libraries against large
> datasets. I found the results interesting and opened this issue to share and
> discuss. My benchmarking code can be found
> [here|https://github.com/jtibshirani/lucene/pull/1] – it's hacky and not easy
> to commit but I’m happy to help anyone get set up with it.
> Approaches
> * LuceneVectorsOnly: a baseline that only indexes vectors, and performs a
> brute force scan to determine nearest neighbors
> * LuceneHnsw: our HNSW implementation
> * [hnswlib|https://github.com/nmslib/hnswlib]: a C++ HNSW implementation
> from the author of the paper
> Datasets
> * sift-128-euclidean: 1 million SIFT feature vectors, dimension 128,
> comparing euclidean distance
> * glove-100-angular: ~1.2 million GloVe word vectors, dimension 100,
> comparing cosine similarity
> *Results on sift-128-euclidean*
> Parameters: M=16, efConstruction=500
> {code:java}
> Approach Recall QPS
> LuceneVectorsOnly() 1.000 6.764
> LuceneHnsw(n_cands=10) 0.603 7736.968
> LuceneHnsw(n_cands=50) 0.890 3605.000
> LuceneHnsw(n_cands=100) 0.953 2237.429
> LuceneHnsw(n_cands=500) 0.996 570.900
> LuceneHnsw(n_cands=800) 0.998 379.589
> hnswlib(n_cands=10) 0.713 69662.756
> hnswlib(n_cands=50) 0.950 28021.582
> hnswlib(n_cands=100) 0.985 16108.538
> hnswlib(n_cands=500) 1.000 4115.886
> hnswlib(n_cands=800) 1.000 2729.680
> {code}
> *Results on glove-100-angular*
> Parameters: M=32, efConstruction=500
> {code:java}
> Approach Recall QPS
> LuceneVectorsOnly() 1.000 6.722
> LuceneHnsw(n_cands=10) 0.507 5036.236
> LuceneHnsw(n_cands=50) 0.760 2099.850
> LuceneHnsw(n_cands=100) 0.833 1233.690
> LuceneHnsw(n_cands=500) 0.941 309.077
> LuceneHnsw(n_cands=800) 0.961 203.782
> hnswlib(n_cands=10) 0.597 43543.345
> hnswlib(n_cands=50) 0.832 14719.165
> hnswlib(n_cands=100) 0.897 8299.948
> hnswlib(n_cands=500) 0.981 1931.985
> hnswlib(n_cands=800) 0.991 881.752
> {code}
> Notes on benchmark:
> * By default, the ann-benchmarks repo retrieves 10 nearest neighbors and
> computes the recall against the true neighbors. The recall calculation has a
> small 'fudge factor' that allows neighbors that are within a small epsilon of
> the best distance. Queries are executed serially to obtain the QPS.
> * I chose parameters where hnswlib performed well, then passed these same
> parameters to Lucene HNSW. For index-time parameters, I set maxConn as M and
> beamWidth as efConstruction. For search parameters, I set k to k, and fanout
> as (num_cands - k) so that the beam search is of size num_cands. Note that
> our default value for beamWidth is 16, which is really low – I wasn’t able to
> obtain acceptable recall until I bumped it to closer to 500 to match the
> hnswlib default.
> * I force-merged to one segment before running searches since this gives the
> best recall + QPS, and also to match hnswlib.
> Some observations:
> * It'd be really nice to extend luceneutil to measure vector search recall
> in addition to latency. That would help ensure we’re benchmarking a more
> realistic scenario, instead of accidentally indexing/ searching at a very low
> recall. Tracking recall would also guard against subtle, unintentional
> changes to the algorithm. It's easy to make an accidental change while
> refactoring, and with approximate algorithms, unit tests don't guard as well
> against this.
> * Lucene HNSW gives a great speed-up over the baseline without sacrificing
> too much recall. But it doesn't perform as well as hnswlib in terms of both
> recall and QPS. We wouldn’t expect the results to line up perfectly, since
> Lucene doesn't actually implement HNSW – the current algorithm isn't actually
> hierarchical and only uses a single graph layer. Does this difference might
> indicate we're leaving performance 'on the table' by not using layers, which
> (I don't think) adds much index time or space? Or are there other algorithmic
> improvements would help close the gap?
> * Setting beamWidth to 500 *really* slowed down indexing. I'll open a
> separate issue with indexing speed results, keeping this one focused on
> search.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]