Re: [PR] Feature/collaborative hnsw search [lucene]

via GitHub Mon, 09 Feb 2026 18:07:41 -0800


krickert commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3874904257


   # Summary
   
   Still a lot of trail and error but real-world tests and properly measuring 
recall show that a lot more testing is required to improve HNSW search.
   
   Finding the optimal collaborative search parameters requires a systematic 
sweep across multiple variables. The test harness will run a full combinatorial 
grid over K-values, shard counts, slack thresholds, and datasets to discover a 
"best configuration" for each scenario within this data set. 
   
   The goal is to keep the formula simple: we expect K and numShards to produce 
a plottable pattern that yields a clean dynamic slack function.
   
   The primary dataset is built from Wikipedia embeddings. Suggestions for 
additional large-scale datasets are welcome, as effective testing requires 
indices large enough to stress the shard boundaries.  Current tests are on a 
16GB index.
   
   Before running the full grid, single-index recall is being validated against 
[luceneutil](https://github.com/mikemccand/luceneutil) to confirm that the 
collaborative mechanism introduces no regression at the individual shard level.
   
   Below is a matrix of tests I'm going to plot against the data:
   
   | Variable  | Values                                    |
   |-----------|-------------------------------------------|
   | K         | 10, 25, 50, 100, 250, 500, 1000          |
   | numShards | 2, 4, 8, 16                              |
   | slack     | 0, 0.001, 0.005, 0.01, 0.02, 0.05        |
   | dataset   | sentences-1024, paragraphs-1024 |
   
   # Test results so far
   
   Discovered some issues with the lower recall values due to dupes which 
invalidated earlier multi-shard tests.  Deduped the index and redid baselines.  
The tests so far suggest a classic trade off where we lose recall as we get 
more aggressive.  The test results below demonstrate what happened when I 
loosened it up too much - a slower search and more nodes visited instead of 
less.
   
   ## 8-Shard Simulation Results
   
   | K    | Mode          | Merged Recall | Avg Latency | Lookups Saved (Higher 
is Better) |
   
|------|---------------|---------------|-------------|----------------------------------|
   | 10   | Baseline      | 0.9690        | 55 ms       | 1,446,967             
           |
   | 10   | Collaborative | 0.8864        | 50 ms       | 1,455,295 (+8k saved) 
           |
   | 100  | Baseline      | 0.9885        | 84 ms       | 1,405,451             
           |
   | 100  | Collaborative | 0.9900        | 96 ms       | 1,354,091 (-51k 
saved)           |
   | 1000 | Baseline      | 0.9969        | 279 ms      | 1,251,298             
           |
   | 1000 | Collaborative | 0.9995        | 620 ms      | 884,881 (-366k saved) 
           |
   
   ## Key Insights from the Run
   
   1. **Recall Ceiling:** On deduped data, the baseline recall is excellent 
(0.97+). The collaborative version at $K=10$ still sees a drop of ~8 points, 
but at $K \ge 100$, it maintains (and slightly exceeds) baseline recall.
   
   2. **The "Exploration Booster" Confirmed:** At $K=1000$, the collaborative 
search is doing significantly more work (620ms vs 279ms). This is the direct 
result of the $0.05$ safety slack and $2k$ warm-up guard being too generous for 
high-density searches. The search is "over-exploring" past the natural 
convergence point of standard HNSW.
   
   3. **K=10 Sweet Spot:** For small $K$, the mechanism is actually working as 
intended for performance (faster than baseline), but the $0.05$ slack is still 
occasionally cutting off critical "bridge" paths needed to match the baseline's 
0.97 recall.
   
   ## Analysis and Next Steps
   
   I'm going to focus on changing the slack value.
   
   The fixed $0.05$ slack is the primary culprit for the performance regression 
at large $K$. It creates an "infinite budget" effect.
   
   **Proposal:** We should now move to the **Dynamic Slack** implementation to 
tighten the pruning at high $K$ and find the "knee" in the recall curve.
   ```
   Slack = BaseSlack * sqrt(numShards / K)
   ```
   
   This would make the slack $\approx 0.04$ at $K=10$ but drop it to $\approx 
0.004$ at $K=1000$, which should drastically cut the latency for large queries 
while keeping the recall gains we've seen. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Feature/collaborative hnsw search [lucene]

Reply via email to