leng25 opened a new pull request, #15790:
URL: https://github.com/apache/lucene/pull/15790
## Summary
This PR implements the optimization suggested in #15024, replacing the
two-step prefix sum loop in `Lucene99HnswVectorsReader` with a single-pass
accumulator variant that avoids redundant memory reads.
**Before:**
```java
currentNeighborsBuffer[0] = dataIn.readVInt();
for (int i = 1; i < arcCount; i++) {
currentNeighborsBuffer[i] = currentNeighborsBuffer[i - 1] +
dataIn.readVInt();
}
```
**After:**
```java
int sum = 0;
for (int i = 0; i < arcCount; i++) {
sum += dataIn.readVInt();
currentNeighborsBuffer[i] = sum;
}
```
This is a follow-up to #15027 by @yossev who proposed the same fix. Since
that PR went stale (merge conflicts, formatting), I'm resubmitting with
conflicts resolved, formatting fixed via `./gradlew tidy`, and benchmark
results included.
I found this while looking for a good first issue to learn the contribution
process — happy to adjust anything based on feedback!
## Benchmark Results
Benchmarks were run using
[luceneutil](https://github.com/mikemccand/luceneutil) KNN benchmark
(`knnPerfTest.py`).
**Machine:** Intel Core i5-10210U, 8 logical cores, ~15 GB RAM
**Dataset:** cohere-v3-wikipedia-en 1024d, 400k docs, 10k queries, 8-bit
quantized, dot_product
**Baseline:**
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized visited index(s) index_docs/s force_merge(s)
num_segments index_size(MB)
0.977 9.920 9.893 0.997 400000 100 100 64
250 8 bits 7955 486.32 822.50 437.90 1
2015.68
```
**Candidate (this PR):**
```
recall latency(ms) netCPU avgCpuCount nDoc topK fanout maxConn
beamWidth quantized visited index(s) index_docs/s force_merge(s)
num_segments index_size(MB)
0.977 9.861 9.833 0.997 400000 100 100 64
250 8 bits 7955 486.32 822.50 437.90 1
2015.68
```
Recall is identical. Results are from a single run so small differences may
fall within normal measurement variance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]