shubhamvishu commented on PR #16092:
URL: https://github.com/apache/lucene/pull/16092#issuecomment-4504886529
#### Luceneutil with Amazon 4K vectors embeddings (forceMerge=False)
NOTE : Run 1 and 2 are on separate 4K embedding dataset(500K) so sharing both
<details>
<summary><b>Run 1 </b></summary>
#### Baseline :
```
Results:
NOTE: nDoc = 500000 for all runs; skipping column
NOTE: searchType = KNN for all runs; skipping column
NOTE: topK = 100 for all runs; skipping column
NOTE: fanout = 100 for all runs; skipping column
NOTE: resultSimilarity = N/A for all runs; skipping column
NOTE: decay = N/A for all runs; skipping column
NOTE: resultCount = 100.000 for all runs; skipping column
NOTE: maxConn = 64 for all runs; skipping column
NOTE: beamWidth = 250 for all runs; skipping column
NOTE: force_merge(s) = 0.00 for all runs; skipping column
NOTE: filterStrategy = null for all runs; skipping column
NOTE: filterSelectivity = N/A for all runs; skipping column
NOTE: overSample = 1.000 for all runs; skipping column
NOTE: bp-reorder = false for all runs; skipping column
NOTE: indexType = HNSW for all runs; skipping column
NOTE: rerank = no for all runs; skipping column
recall latency(ms) netCPU avgCpuCount quantized visited index(s)
index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB)
0.847 1.502 6.435 4.284 1 bits 30107 122.60
4078.17 11 8090.94 8063.316 250.816
0.873 1.753 7.827 4.464 2 bits 28103 124.40
4019.26 12 8333.67 8307.457 494.957
0.906 2.459 10.420 4.238 4 bits 28075 123.27
4056.01 12 8821.31 8796.692 984.192
0.931 3.068 11.445 3.730 7 bits 19281 145.29
3441.42 7 9798.58 9773.254 1960.754
0.936 3.467 10.623 3.064 8 bits 18113 144.72
3454.88 6 9798.82 9773.254 1960.754
```
#### Candidate:
```
Results:
NOTE: nDoc = 500000 for all runs; skipping column
NOTE: searchType = KNN for all runs; skipping column
NOTE: topK = 100 for all runs; skipping column
NOTE: fanout = 100 for all runs; skipping column
NOTE: resultSimilarity = N/A for all runs; skipping column
NOTE: decay = N/A for all runs; skipping column
NOTE: resultCount = 100.000 for all runs; skipping column
NOTE: maxConn = 64 for all runs; skipping column
NOTE: beamWidth = 250 for all runs; skipping column
NOTE: force_merge(s) = 0.00 for all runs; skipping column
NOTE: filterStrategy = null for all runs; skipping column
NOTE: filterSelectivity = N/A for all runs; skipping column
NOTE: overSample = 1.000 for all runs; skipping column
NOTE: bp-reorder = false for all runs; skipping column
NOTE: indexType = HNSW for all runs; skipping column
NOTE: rerank = no for all runs; skipping column
recall latency(ms) netCPU avgCpuCount quantized visited index(s)
index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB)
0.906 1.176 4.908 4.174 1 bits 22856 134.87
3707.16 9 8090.28 8063.316 250.816
0.936 1.459 5.608 3.845 2 bits 20761 126.32
3958.30 9 8333.64 8307.457 494.957
0.971 2.228 9.872 4.430 4 bits 27231 140.24
3565.42 12 8821.80 8796.692 984.192
0.988 3.446 15.027 4.361 7 bits 25644 148.67
3363.27 10 9798.57 9773.254 1960.754
0.989 3.164 11.303 3.573 8 bits 19280 145.93
3426.35 7 9799.14 9773.254 1960.754
```
</details>
<details>
<summary><b>Run 2 </b></summary>
#### Baseline :
```
Results:
NOTE: nDoc = 500000 for all runs; skipping column
NOTE: searchType = KNN for all runs; skipping column
NOTE: topK = 100 for all runs; skipping column
NOTE: fanout = 100 for all runs; skipping column
NOTE: resultSimilarity = N/A for all runs; skipping column
NOTE: decay = N/A for all runs; skipping column
NOTE: resultCount = 100.000 for all runs; skipping column
NOTE: maxConn = 64 for all runs; skipping column
NOTE: beamWidth = 250 for all runs; skipping column
NOTE: force_merge(s) = 0.00 for all runs; skipping column
NOTE: filterStrategy = null for all runs; skipping column
NOTE: filterSelectivity = N/A for all runs; skipping column
NOTE: overSample = 1.000 for all runs; skipping column
NOTE: bp-reorder = false for all runs; skipping column
NOTE: indexType = HNSW for all runs; skipping column
NOTE: rerank = no for all runs; skipping column
recall latency(ms) netCPU avgCpuCount quantized visited index(s)
index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB)
0.813 1.145 4.208 3.676 1 bits 17977 121.83
4104.15 8 8089.82 8063.316 250.816
0.849 1.423 6.169 4.336 2 bits 21256 115.82
4317.01 10 8332.16 8307.457 494.957
0.885 1.805 6.790 3.761 4 bits 17201 130.58
3829.22 9 8820.67 8796.692 984.192
0.921 2.918 12.660 4.339 7 bits 20622 136.97
3650.57 10 9796.88 9773.254 1960.754
0.926 2.678 8.020 2.995 8 bits 13287 141.24
3540.07 5 9797.46 9773.254 1960.754
```
#### Candidate:
```
Results:
NOTE: nDoc = 500000 for all runs; skipping column
NOTE: searchType = KNN for all runs; skipping column
NOTE: topK = 100 for all runs; skipping column
NOTE: fanout = 100 for all runs; skipping column
NOTE: resultSimilarity = N/A for all runs; skipping column
NOTE: decay = N/A for all runs; skipping column
NOTE: resultCount = 100.000 for all runs; skipping column
NOTE: maxConn = 64 for all runs; skipping column
NOTE: beamWidth = 250 for all runs; skipping column
NOTE: force_merge(s) = 0.00 for all runs; skipping column
NOTE: filterStrategy = null for all runs; skipping column
NOTE: filterSelectivity = N/A for all runs; skipping column
NOTE: overSample = 1.000 for all runs; skipping column
NOTE: bp-reorder = false for all runs; skipping column
NOTE: indexType = HNSW for all runs; skipping column
NOTE: rerank = no for all runs; skipping column
recall latency(ms) netCPU avgCpuCount quantized visited index(s)
index_docs/s num_segments index_size(MB) vec_disk(MB) vec_RAM(MB)
0.878 1.104 4.588 4.154 1 bits 20309 133.12
3756.15 9 8089.49 8063.316 250.816
0.906 1.333 5.127 3.847 2 bits 17963 136.62
3659.87 9 8333.10 8307.457 494.957
0.957 1.797 5.546 3.085 4 bits 15179 140.91
3548.41 6 8821.53 8796.692 984.192
0.969 2.928 10.953 3.741 7 bits 17888 143.28
3489.79 9 9797.76 9773.254 1960.754
0.969 3.004 11.197 3.727 8 bits 18177 136.01
3676.17 9 9797.79 9773.254 1960.754
```
</details>
cc - @mccullocht
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]