Re: [PR] Avoid reconstructing HNSW graphs during segment merging. [lucene]

via GitHub Wed, 05 Nov 2025 18:19:52 -0800


Pulkitg64 commented on PR #15003:
URL: https://github.com/apache/lucene/pull/15003#issuecomment-3494558786


   Ran `knnPerfTest` to compare baseline (current main branch) against 
candidate (this PR). The candidate implementation includes graph reconnection 
and rebalancing after deleted nodes are dropped. Benchmarks were performed with 
various deletion percentages and `maxConn` values using a single merge thread 
(no concurrency).
   
   ### Summary
   
   The candidate shows better `forceMerge` performance than baseline up to 
`maxConn=32` without significant recall degradation. However, at `maxConn=64`, 
performance is comparable to baseline, likely because:
   
   1. A high number of nodes require reconnection
   2. The dense graph structure increases candidate search time
   
   **Current Reconnection Strategy:**
   
   We reconnect nodes when their connection count falls below 50% of the 
maximum allowed:
   - **Level > 0**: Reconnect if connections < `maxConn / 2`
   - **Level 0**: Reconnect if connections < `(2 × maxConn) / 2` = `maxConn`
   
   **Observation at maxConn=64:**
   
   At level 0, more than 90% of nodes (p90) have fewer than 64 connections, 
triggering reconnection for nearly all of them. (** I think we can try relaxing 
50% threshold**)
   
   ```
   Node connections stats:
   Graph level=2 size=21, avg=4.67, p5=2, p10=2, p20=3, p30=3, p50=4, p90=8, 
p99=9, max=9
   Graph level=1 size=1407, avg=27.91, p5=15, p10=17, p20=20, p30=22, p50=27, 
p90=40, p99=54, max=61
   Graph level=0 size=90050, avg=33.03, p5=11, p10=13, p20=17, p30=20, p50=27, 
p90=61, p99=112, max=128
   ```
   
   | Experiment |         | Baseline |                | Candidate |             
   | Change        |                        |
   
|------------|---------|----------|----------------|-----------|----------------|---------------|------------------------|
   | Delete %   | MaxConn | Recall   | ForceMergeTime (sec) | Recall    | 
ForceMergeTime (sec) | Recall Change | ForceMerge Time Change |
   | 10         | 8       | 0.796    | 73.97          | 0.783     | 18.78       
   | -1.63%        | 3.9x            |
   | 10         | 16      | 0.89     | 109.05         | 0.884     | 46.07       
   | -0.67%        | 2.3x            |
   | 10         | 32      | 0.926    | 136.24         | 0.923     | 101.07      
   | -0.32%        | 1.35x             |
   | 10         | 64      | 0.935    | 148.34         | 0.935     | 155.45      
   | 0.00%         | 0.95x            |
   | 20         | 8       | 0.8      | 63.94          | 0.783     | 18.29       
   | -2.13%        | 3.5x            |
   | 20         | 16      | 0.895    | 95.34          | 0.881     | 44.89       
   | -1.56%        | 2.1x             |
   | 20         | 32      | 0.931    | 116.78         | 0.924     | 91.88       
   | -0.75%        | 1.28x             |
   | 20         | 64      | 0.938    | 130.33         | 0.935     | 135.48      
   | -0.32%        | 0.96x            |
   | 30         | 8       | 0.808    | 55.11          | 0.784     | 18.66       
   | -2.97%        | 2.9x            |
   | 30         | 16      | 0.9      | 83.11          | 0.885     | 42.96       
   | -1.67%        | 1.9x            |
   | 30         | 32      | 0.935    | 100.64         | 0.927     | 81.03       
   | -0.86%        | 1.24x            |
   | 30         | 64      | 0.942    | 111.51         | 0.938     | 111.25      
   | -0.42%        | 1.002x            |
   | 50         | 8       | 0.822    | 38.69          | 0.798     | 23.39       
   | -2.92%        | 1.65x            |
   | 50         | 16      | 0.914    | 56.18          | 0.87      | 42.85       
   | -4.81%        | 1.3x            |
   | 50         | 32      | 0.943    | 70.05          | 0.935     | 61.89       
   | -0.85%        | 1.13x            |
   | 50         | 64      | 0.951    | 72.26          | 0.942     | 71.55       
   | -0.95%        | 1.00x            |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Avoid reconstructing HNSW graphs during segment merging. [lucene]

Reply via email to