[
https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Sokolov updated LUCENE-9644:
------------------------------------
Status: Open (was: Open)
https://github.com/apache/lucene-solr/pull/2157
> HNSW diverse neighbor selection heuristic
> -----------------------------------------
>
> Key: LUCENE-9644
> URL: https://issues.apache.org/jira/browse/LUCENE-9644
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael Sokolov
> Priority: Major
>
> This will replace the simple nearest neighbor selection with a criterion that
> takes into account the distance of the neighbors from each other. It is seen
> to provide dramatically improved recall on at least two datasets, and is what
> is being used by our reference implementation, hnswlib. The basic idea is
> that when selecting the nearest neighbors to associate with a new node added
> to the graph, we filter using a diversity criterion. If a candidate neighbor
> is closer to an already-added (closer to the new node) neighbor than it is to
> the new node, then we pass over it, moving on to more-distant, but presumably
> more diverse neighbors. The same criterion is also (re-) applied to the
> neighbor nodes' neighbors, since we add the links bidirectionally.
> h2. Results:
> h3. GloVe/Wikipedia
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
> ms||
> |0.643| 0.77| 100000| 50| 32| 64| 1742| 22830|
> |0.671| 0.95| 100000| 100| 32| 64| 2141| 0|
> |0.704| 1.32| 100000| 200| 32| 64| 2923| 0|
> |0.739| 2.04| 100000| 400| 32| 64| 4382| 0|
> |0.470| 0.91| 1000000| 50| 32| 64| 2068| 337081|
> |0.496| 1.21| 1000000| 100| 32| 64| 2548| 0|
> |0.533| 1.77| 1000000| 200| 32| 64| 3479| 0|
> |0.573| 2.58| 1000000| 400| 32| 64| 5257| 0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
> ms||
> |0.801| 0.57| 100000| 50| 32| 64| 593| 17985|
> |0.840| 0.67| 100000| 100| 32| 64| 738| 0|
> |0.883| 0.97| 100000| 200| 32| 64| 1018| 0|
> |0.921| 1.36| 100000| 400| 32| 64| 1502| 0|
> |0.723| 0.71| 1000000| 50| 32| 64| 860| 298383|
> |0.761| 0.77| 1000000| 100| 32| 64| 1058| 0|
> |0.806| 1.06| 1000000| 200| 32| 64| 1442| 0|
> |0.854| 1.67| 1000000| 400| 32| 64| 2159| 0|
> h3. Dataset from work:
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
> ms||
> |0.933| 1.41| 100000| 50| 32| 64| 1496| 35462|
> |0.948| 1.39| 100000| 100| 32| 64| 1872| 0|
> |0.961| 2.10| 100000| 200| 32| 64| 2591| 0|
> |0.972| 3.04| 100000| 400| 32| 64| 3939| 0|
> |0.827| 1.34| 1000000| 50| 32| 64| 1676| 535802|
> |0.854| 1.76| 1000000| 100| 32| 64| 2056| 0|
> |0.887| 2.47| 1000000| 200| 32| 64| 2761| 0|
> |0.907| 3.75| 1000000| 400| 32| 64| 4129| 0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index
> ms||
> |0.966| 1.18| 100000| 50| 32| 64| 1480| 37656|
> |0.977| 1.46| 100000| 100| 32| 64| 1832| 0|
> |0.988| 2.00| 100000| 200| 32| 64| 2472| 0|
> |0.995| 3.14| 100000| 400| 32| 64| 3629| 0|
> |0.944| 1.34| 1000000| 50| 32| 64| 1780| 526834|
> |0.959| 1.71| 1000000| 100| 32| 64| 2222| 0|
> |0.975| 2.30| 1000000| 200| 32| 64| 3041| 0|
> |0.986| 3.56| 1000000| 400| 32| 64| 4543| 0|
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]