Michael Sokolov created LUCENE-9644: ---------------------------------------
Summary: HNSW diverse neighbor selection heuristic Key: LUCENE-9644 URL: https://issues.apache.org/jira/browse/LUCENE-9644 Project: Lucene - Core Issue Type: Improvement Reporter: Michael Sokolov This will replace the simple nearest neighbor selection with a criterion that takes into account the distance of the neighbors from each other. It is seen to provide dramatically improved recall on at least two datasets, and is what is being used by our reference implementation, hnswlib. The basic idea is that when selecting the nearest neighbors to associate with a new node added to the graph, we filter using a diversity criterion. If a candidate neighbor is closer to an already-added (closer to the new node) neighbor than it is to the new node, then we pass over it, moving on to more-distant, but presumably more diverse neighbors. The same criterion is also (re-) applied to the neighbor nodes' neighbors, since we add the links bidirectionally. ## Results: ### GloVe/Wikipedia #### baseline ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index ms|| |0.643| 0.77| 100000| 50| 32| 64| 1742| 22830| |0.671| 0.95| 100000| 100| 32| 64| 2141| 0| |0.704| 1.32| 100000| 200| 32| 64| 2923| 0| |0.739| 2.04| 100000| 400| 32| 64| 4382| 0| |0.470| 0.91| 1000000| 50| 32| 64| 2068| 337081| |0.496| 1.21| 1000000| 100| 32| 64| 2548| 0| |0.533| 1.77| 1000000| 200| 32| 64| 3479| 0| |0.573| 2.58| 1000000| 400| 32| 64| 5257| 0| #### diverse ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index ms|| |0.801| 0.57| 100000| 50| 32| 64| 593| 17985| |0.840| 0.67| 100000| 100| 32| 64| 738| 0| |0.883| 0.97| 100000| 200| 32| 64| 1018| 0| |0.921| 1.36| 100000| 400| 32| 64| 1502| 0| |0.723| 0.71| 1000000| 50| 32| 64| 860| 298383| |0.761| 0.77| 1000000| 100| 32| 64| 1058| 0| |0.806| 1.06| 1000000| 200| 32| 64| 1442| 0| |0.854| 1.67| 1000000| 400| 32| 64| 2159| 0| ### Dataset from work: #### baseline ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index ms|| |0.933| 1.41| 100000| 50| 32| 64| 1496| 35462| |0.948| 1.39| 100000| 100| 32| 64| 1872| 0| |0.961| 2.10| 100000| 200| 32| 64| 2591| 0| |0.972| 3.04| 100000| 400| 32| 64| 3939| 0| |0.827| 1.34| 1000000| 50| 32| 64| 1676| 535802| |0.854| 1.76| 1000000| 100| 32| 64| 2056| 0| |0.887| 2.47| 1000000| 200| 32| 64| 2761| 0| |0.907| 3.75| 1000000| 400| 32| 64| 4129| 0| #### diverse ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index ms|| |0.966| 1.18| 100000| 50| 32| 64| 1480| 37656| |0.977| 1.46| 100000| 100| 32| 64| 1832| 0| |0.988| 2.00| 100000| 200| 32| 64| 2472| 0| |0.995| 3.14| 100000| 400| 32| 64| 3629| 0| |0.944| 1.34| 1000000| 50| 32| 64| 1780| 526834| |0.959| 1.71| 1000000| 100| 32| 64| 2222| 0| |0.975| 2.30| 1000000| 200| 32| 64| 3041| 0| |0.986| 3.56| 1000000| 400| 32| 64| 4543| 0| -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org