krickert opened a new pull request, #15676: URL: https://github.com/apache/lucene/pull/15676
# [HNSW] Collaborative Search via Dynamic Threshold Feedback > **Status: Experimental** - This change validates the pruning mechanism at the single-graph level, provides the collector infrastructure for multi-segment coordination within an index, and includes simulated multi-shard tests (multiple separate HNSW graphs and separate Directory instances combined via MultiReader) demonstrating cross-shard pruning gains. Full distributed multi-shard testing will happen in a follow-up PR for OpenSearch. ## Summary Enable HNSW graph search to accept externally-updated similarity thresholds during traversal. This allows multiple concurrent search processes (threads, shards, or nodes) to share a global minimum-score bar, pruning each other's search frontiers in real time. ## Problem Statement In current distributed KNN implementations, each shard searches its local HNSW graph in isolation. A shard will continue exploring candidates even when other shards have already found globally superior matches. This redundant traversal wastes CPU and IO, and the cost scales with K and the number of shards. ## Proposed Changes 1. **Dynamic Threshold Re-fetching** (`HnswGraphSearcher.java`): Re-read `minCompetitiveSimilarity()` from the collector on every iteration of the HNSW search loop, rather than only at initialization. If the value has increased (due to an external update), `minAcceptedSimilarity` is raised and the search frontier is pruned accordingly. 2. **`CollaborativeKnnCollector`**: A `KnnCollector.Decorator` that wraps a standard `TopKnnCollector` and an `AtomicInteger` (storing float bits via `Float.floatToRawIntBits`). Its `minCompetitiveSimilarity()` returns `max(local, global)`, allowing external signals to raise the pruning bar. Updates use a lock-free CAS loop. 3. **`CollaborativeKnnCollectorManager`**: Creates per-segment `CollaborativeKnnCollector` instances that share a single `AtomicInteger`, enabling threshold propagation across leaf segments within a node. ## Test Results and Methodology Unit tests measure the number of graph nodes visited under two conditions: 1. **Standard search**: A normal `TopKnnCollector` with no external threshold. 2. **Collaborative search**: A `CollaborativeKnnCollector` where the global bar is set using a score derived from the standard search's results (simulating a "discovered" top-K score). | Scenario | Standard Visited | Collaborative Visited | Reduction | | :--- | :--- | :--- | :--- | | **Basic (K=10, 2-dim, 20K docs)** | 135 | 10 | **92.6%** | | **High-Dimension (K=100, 128-dim, 10K docs)** | ~6,600 | ~400 | **~94%** | | **High-K (K=1000, 16-dim, 30K docs)** | ~11,500 | ~350 | **~97%** | Results vary across runs because Lucene's test framework randomizes graph construction parameters (maxConn, beamWidth). A subsequent run with smaller random values produced: | Scenario | Standard Visited | Collaborative Visited | Reduction | | :--- | :--- | :--- | :--- | | **Basic (K=10, 2-dim, 20K docs)** | 59 | 43 | **~27%** | | **High-Dimension (K=100, 128-dim, 10K docs)** | 2,620 | 55 | **~98%** | | **High-K (K=1000, 16-dim, 30K docs)** | 8,762 | 1,756 | **~80%** | The pruning is consistently effective across random seeds, with the strongest gains in high-dimension and high-K scenarios where graph traversal is most expensive. The basic scenario is more sensitive to graph topology - smaller graphs with fewer connections have less room to prune. ### Important Caveats These numbers represent **upper-bound** savings under idealized conditions: - The global bar is set to a strong score *before* the collaborative search starts, simulating a scenario where another shard has already completed its search in full. In a real distributed system, the bar ramps up incrementally. - There is no network round-trip latency in the test. - The tests run against a single graph on a single node. The fundamental mechanism - raising the threshold mid-traversal to skip provably non-competitive subgraphs - is not test-specific. Real-world savings should still be significant, particularly for high-K queries (K >= 100) and dense embedding spaces. ## Thread Safety The implementation adds no locks or synchronization to the HNSW search hot path. Visibility of the shared threshold is guaranteed by the volatile read semantics of `AtomicInteger.get()`, which the collector calls on every loop iteration. Updates from external threads become visible on the next iteration without explicit memory fencing. The `CollaborativeKnnCollector.updateGlobalMinSimilarity()` method uses a standard CAS loop to ensure monotonic updates (the bar can only go up, never down). ## Design Note: Threshold Propagation Is External The Lucene-layer change is intentionally passive: it *reads* the global bar but does not *write* it. Raising the bar based on incoming results from other shards is the responsibility of the orchestration layer. This keeps the Lucene change minimal and avoids coupling graph traversal logic to any specific coordination protocol. ## Use Case: Streaming Coordinator This change is a prerequisite for high-performance distributed KNN search. In a streaming model, the coordinator can broadcast the current "Global Kth Score" back to all shards. Shards running this modified searcher will instantly prune their frontier, terminating their local search as soon as it is mathematically impossible to improve the global result set. ## Multi-Index Performance Results The single-graph tests above prove the mechanism works. The following tests measure what happens when collaborative pruning is applied across **multiple separate HNSW graphs** - the scenario that maps directly to cross-shard KNN search in OpenSearch. ### Test: Multi-Index High-K (low-level HNSW graphs) 5 separate HNSW graphs, 5000 vectors each (25K total), dim=32, K=500. Standard search queries each graph independently and merges. Collaborative search pre-sets the pruning bar to the median score of the merged top-500, then searches all 5 graphs with a shared `AtomicInteger`. | Run | Standard Total Visited | Collaborative Total Visited | Reduction | | :--- | :--- | :--- | :--- | | 1 | 20,132 | 5,393 | **73.2%** | | 2 | 20,263 | 4,417 | **78.2%** | ### Test: Multi-Index End-to-End (IndexSearcher + MultiReader) 5 separate Directory instances, 2000 vectors each (10K total), dim=32, K=100. Combined via `MultiReader` and searched through `IndexSearcher` - the same code path OpenSearch uses. Visited counts captured via `mergeLeafResults` override. | Run | Standard Visited | Collaborative Visited | Reduction | | :--- | :--- | :--- | :--- | | 1 | 4,610 | 135 | **97.1%** | ### What this means for high-K cross-shard search The cost of KNN search scales with K and the number of shards. Without collaborative pruning, each shard does full work independently - a K=2000 query across 20 shards means 20 full HNSW traversals with no shared knowledge. With collaborative pruning, the bar rises as soon as *any* shard finds good results, and every other shard prunes accordingly. The effect compounds: more shards means the bar rises faster, which means more pruning per shard. The numbers above are from K=100 and K=500 across 5 graphs. At K=2000 across 20 shards, the pruning surface is larger and the ratio of useful-to-wasted traversal work is worse in the standard case - which means collaborative pruning has even more room to cut. Queries that are currently too expensive to run (high K, many shards, high-dimensional embeddings) become feasible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
