Re: [PR] [WIP] Revisiting JVector codec [lucene]

via GitHub Fri, 06 Mar 2026 08:49:10 -0800


abernardi597 commented on PR #15472:
URL: https://github.com/apache/lucene/pull/15472#issuecomment-4012846854

Here are some of the results from my benchmarking on the Cohere V3 dataset.

## Indexing
On a `m8g.16xlarge` EC2 instance running AL2, I indexed 5M documents using
16 indexing threads and 48 merge threads with both Lucene's HNSW codec and the
candidate JVector codec over a range of compression levels. For HNSW indices,
this involves using Optimized Scalar Quantization, while JVector relies on
Product Quantization for compression. Lucene's OSQ supports 8-bit, 4-bit, and
1-bit quantization achieving 4x, 8x, and 32x compression, respectively. Product
Quantization supports the same compression levels by setting the number of
subvectors appropriately (i.e. 1-dim subvectors = 4x compression, 2-dim
subvectors = 8x, etc.).

All indices were created with `maxConn=64` and `beamWidth=250` using the
COSINE similarity metric. For JVector, I set `alpha=1.0`,
`neighborOverflow=1.2`, and `useHierarchy=true` to generate graphs with a
similar structure to HNSW and mitigate differences in recall due to graph
quality.

![legend](https://github.com/user-attachments/assets/6de5050e-ab67-4e6e-94dc-094eb3682110)

![indexing](https://github.com/user-attachments/assets/a3c1f0d4-c6d0-409f-8b47-5ccf25e0e40f)

The graph above shows the wall-clock time to index all 5M documents, where
the slightly transparent section of the bar denotes the additional time taken
to force-merge the segments into a single graph.

Overall, indexing time is quite comparable in the case of 1-bit quantization
or full-precision vectors, but Lucene shows much better performance with 4-bit
and 8-bit quantization. In every case, however, JVector seems to spend an
inordinate amount of time in merging segments compared to HNSW. The discrepancy
is likely due to a key optimization found in Lucene's HNSW implementation that
re-uses adjacency information from the incoming graphs when merging to reduce
search costs for inserting most nodes into the graph. I did not port this
functionality, but I am interested in contributing it upstream.

## Searching
The following search benchmarks perform 10k queries across 8 threads using
Lucene's `MMapDirectory` to memory-map files in the index. The `topK=100`
nearest neighbors for each target vector are compared to the ground-truth
computed by brute-force to compute the recall. For each index, I ran the
benchmarks using `overSample=1..5`, both with and without rerank. The
benchmarks execute each query as a warm-up before measuring anything.

To get an idea of the best-case performance, I first ran search benchmarks
on the same `m8g.16xlarge` instance I used to build the index, since it can
hold the entire index in memory and simulate a "hot" index. I used the Lucene
preload API to ensure that the index files were loaded into memory.

I also wanted to run the benchmarks in a memory-constrained environment to
simulate a "warm" index so I ran the benchmarks on a `c8gd.xlarge` instance
running AL2023, copying the index files onto the direct-attached SSD formatted
with xfs to simulate a "warm" index. The benchmarks do not attempt to preload
the index files in this case, since they cannot all fit in the limited memory
of the machine. Moreover, `MADV_RANDOM` is used for the graph index and vector
files to minimize IO saturation due to overzealous kernel prefetching despite
mostly random disk access.

![legend](https://github.com/user-attachments/assets/6de5050e-ab67-4e6e-94dc-094eb3682110)

![latency](https://github.com/user-attachments/assets/406d476f-5e19-416b-8480-644ec0b25492)

This plot of recall over the average CPU time per query illustrates how
over-sampling and re-ranking affect latency and recall. Each line connects the
points with increasing over-sampling. Dashed lines connect the data points with
reranking, while the solid lines connect those without.

From this we can see that generally Lucene outperforms JVector for "hot"
indices. It does seem that JVector benefits more from reranking without
full-precision vectors, though this may be an artifact of PQ more than
anything. It is also notable that JVector 32x compressed has lower latencies
than HNSW, but again its hard to say whether that is a win of the JVector
algorithm rather than its implementation of PQ.

For the "warm" index benchmarks, it seems the performance for JVector is
less affected by reranking. This would indicate the advantage of DiskANN-style
reranking where full-precision vectors are gathered inline with the graph
search (relying on page locality to get them for "free"), rather than an
explicit second phase for reranking. At the time of writing, however, it seems
that JVector also [performs re-ranking in second
phase](https://github.com/datastax/jvector/blob/3ca3ce10b20d807c95f4843cc989ae0faf712a54/jvector-base/src/main/java/io/github/jbellis/jvector/graph/GraphSearcher.java#L229),
like Lucene HNSW. This leads me to believe the discrepancy is due to how
JVector explicitly loads quantized vectors onto the heap, whereas HNSW is
memory-mapping the quantized vectors. The page cache may be more effective at
keeping full-precision vectors (which are likely also loaded when traversing
the index, due to locality) resident, despite less cache space due to increased
heap usage,
when it does not need to juggle more disk accesses.

---

Overall, it seems like Lucene HNSW is more competitive with JVector than
[some
benchmarks](https://github.com/opensearch-project/opensearch-jvector/blob/main/README.md#benchmarks)
would suggest.
Even so, it could still be worth adding it to the sandbox as an alternative
KNN engine, or perhaps trying to incorporate some of its ideas/features into
Lucene.

I will update this PR with a non-draft revision, pending some cleanup of my
commits and `knnPerfTest.py` PRs in `luceneutil`.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [WIP] Revisiting JVector codec [lucene]

Reply via email to