Michael Sokolov created LUCENE-9644:
---------------------------------------

             Summary: HNSW diverse neighbor selection heuristic
                 Key: LUCENE-9644
                 URL: https://issues.apache.org/jira/browse/LUCENE-9644
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael Sokolov


This will replace the simple nearest neighbor selection with a criterion that 
takes into account the distance of the neighbors from each other. It is seen to 
provide dramatically improved recall on at least two datasets, and is what is 
being used by our reference implementation, hnswlib. The basic idea is that 
when selecting  the nearest neighbors to associate with a new node added to the 
graph, we filter using a diversity criterion. If a candidate neighbor is closer 
to an already-added (closer to the new node) neighbor than it is to the new 
node, then we pass over it, moving on to more-distant, but presumably more 
diverse neighbors. The same criterion is also (re-) applied to the neighbor 
nodes' neighbors, since we add the links bidirectionally.

## Results:
### GloVe/Wikipedia

#### baseline
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||    index 
ms||
|0.643|  0.77|  100000| 50|     32|     64|     1742|   22830|
|0.671|  0.95|  100000| 100|    32|     64|     2141|   0|
|0.704|  1.32|  100000| 200|    32|     64|     2923|   0|
|0.739|  2.04|  100000| 400|    32|     64|     4382|   0|
|0.470|  0.91|  1000000|        50|     32|     64|     2068|   337081|
|0.496|  1.21|  1000000|        100|    32|     64|     2548|   0|
|0.533|  1.77|  1000000|        200|    32|     64|     3479|   0|
|0.573|  2.58|  1000000|        400|    32|     64|     5257|   0|

#### diverse
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||    index 
ms||
|0.801|  0.57|  100000| 50|     32|     64|     593|    17985|
|0.840|  0.67|  100000| 100|    32|     64|     738|    0|
|0.883|  0.97|  100000| 200|    32|     64|     1018|   0|
|0.921|  1.36|  100000| 400|    32|     64|     1502|   0|
|0.723|  0.71|  1000000|        50|     32|     64|     860|    298383|
|0.761|  0.77|  1000000|        100|    32|     64|     1058|   0|
|0.806|  1.06|  1000000|        200|    32|     64|     1442|   0|
|0.854|  1.67|  1000000|        400|    32|     64|     2159|   0|


### Dataset from work:
#### baseline
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||    index 
ms||
|0.933|  1.41|  100000| 50|     32|     64|     1496|   35462|
|0.948|  1.39|  100000| 100|    32|     64|     1872|   0|
|0.961|  2.10|  100000| 200|    32|     64|     2591|   0|
|0.972|  3.04|  100000| 400|    32|     64|     3939|   0|
|0.827|  1.34|  1000000|        50|     32|     64|     1676|   535802|
|0.854|  1.76|  1000000|        100|    32|     64|     2056|   0|
|0.887|  2.47|  1000000|        200|    32|     64|     2761|   0|
|0.907|  3.75|  1000000|        400|    32|     64|     4129|   0|

#### diverse
||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||    index 
ms||
|0.966|  1.18|  100000| 50|     32|     64|     1480|   37656|
|0.977|  1.46|  100000| 100|    32|     64|     1832|   0|
|0.988|  2.00|  100000| 200|    32|     64|     2472|   0|
|0.995|  3.14|  100000| 400|    32|     64|     3629|   0|
|0.944|  1.34|  1000000|        50|     32|     64|     1780|   526834|
|0.959|  1.71|  1000000|        100|    32|     64|     2222|   0|
|0.975|  2.30|  1000000|        200|    32|     64|     3041|   0|
|0.986|  3.56|  1000000|        400|    32|     64|     4543|   0|




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to