Pulkitg64 commented on PR #15003: URL: https://github.com/apache/lucene/pull/15003#issuecomment-3494558786
Ran `knnPerfTest` to compare baseline (current main branch) against candidate (this PR). The candidate implementation includes graph reconnection and rebalancing after deleted nodes are dropped. Benchmarks were performed with various deletion percentages and `maxConn` values using a single merge thread (no concurrency). ### Summary The candidate shows better `forceMerge` performance than baseline up to `maxConn=32` without significant recall degradation. However, at `maxConn=64`, performance is comparable to baseline, likely because: 1. A high number of nodes require reconnection 2. The dense graph structure increases candidate search time **Current Reconnection Strategy:** We reconnect nodes when their connection count falls below 50% of the maximum allowed: - **Level > 0**: Reconnect if connections < `maxConn / 2` - **Level 0**: Reconnect if connections < `(2 × maxConn) / 2` = `maxConn` **Observation at maxConn=64:** At level 0, more than 90% of nodes (p90) have fewer than 64 connections, triggering reconnection for nearly all of them. (** I think we can try relaxing 50% threshold**) ``` Node connections stats: Graph level=2 size=21, avg=4.67, p5=2, p10=2, p20=3, p30=3, p50=4, p90=8, p99=9, max=9 Graph level=1 size=1407, avg=27.91, p5=15, p10=17, p20=20, p30=22, p50=27, p90=40, p99=54, max=61 Graph level=0 size=90050, avg=33.03, p5=11, p10=13, p20=17, p30=20, p50=27, p90=61, p99=112, max=128 ``` | Experiment | | Baseline | | Candidate | | Change | | |------------|---------|----------|----------------|-----------|----------------|---------------|------------------------| | Delete % | MaxConn | Recall | ForceMergeTime (sec) | Recall | ForceMergeTime (sec) | Recall Change | ForceMerge Time Change | | 10 | 8 | 0.796 | 73.97 | 0.783 | 18.78 | -1.63% | 3.9x | | 10 | 16 | 0.89 | 109.05 | 0.884 | 46.07 | -0.67% | 2.3x | | 10 | 32 | 0.926 | 136.24 | 0.923 | 101.07 | -0.32% | 1.35x | | 10 | 64 | 0.935 | 148.34 | 0.935 | 155.45 | 0.00% | 0.95x | | 20 | 8 | 0.8 | 63.94 | 0.783 | 18.29 | -2.13% | 3.5x | | 20 | 16 | 0.895 | 95.34 | 0.881 | 44.89 | -1.56% | 2.1x | | 20 | 32 | 0.931 | 116.78 | 0.924 | 91.88 | -0.75% | 1.28x | | 20 | 64 | 0.938 | 130.33 | 0.935 | 135.48 | -0.32% | 0.96x | | 30 | 8 | 0.808 | 55.11 | 0.784 | 18.66 | -2.97% | 2.9x | | 30 | 16 | 0.9 | 83.11 | 0.885 | 42.96 | -1.67% | 1.9x | | 30 | 32 | 0.935 | 100.64 | 0.927 | 81.03 | -0.86% | 1.24x | | 30 | 64 | 0.942 | 111.51 | 0.938 | 111.25 | -0.42% | 1.002x | | 50 | 8 | 0.822 | 38.69 | 0.798 | 23.39 | -2.92% | 1.65x | | 50 | 16 | 0.914 | 56.18 | 0.87 | 42.85 | -4.81% | 1.3x | | 50 | 32 | 0.943 | 70.05 | 0.935 | 61.89 | -0.85% | 1.13x | | 50 | 64 | 0.951 | 72.26 | 0.942 | 71.55 | -0.95% | 1.00x | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
