atris commented on PR #15613:
URL: https://github.com/apache/lucene/pull/15613#issuecomment-3899240459
I finished the 16.41M benchmark on 32GB RAM. SPANN R=1 is the lowest‑latency
config observed; recall tops out ~0.847 even with higher NProbe. R=2 fails at
full scale due to disk exhaustion. HNSW hits ~0.99 recall but ~2s latency, so
not viable for interactive use on this hardware. The recall ceiling is
hardware‑bound, not a SPANN limit; higher recall is achievable when replication
is feasible (validated at 10M).
Luceneutil summary loglines:
SPANN R=1, 16.41M (NProbe 12/24/48)
SUMMARY: 0.833 46.500 … 16410000 … 64319.01 … SPANN
SUMMARY: 0.844 12.320 … 16410000 … 64319.02 … SPANN
SUMMARY: 0.847 13.400 … 16410000 … 64318.87 … SPANN
SPANN R=2 failure, 16.41M
HNSW, 16.41M
SUMMARY: 0.991 1849.740 … 16410000 … HNSW
SUMMARY: 0.993 2024.140 … 16410000 … HNSW
SUMMARY: 0.996 2076.020 … 16410000 … HNSW
SPANN R=2 success, 10M
SUMMARY: 0.933 60.550 … 10000000 … SPANN
Current work: disk‑efficient SPANN build (partitionId+docId only) +
centroid HNSW assignment to reduce scratch space and indexing time, aiming to
make R=2 feasible at 16.4M.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]