Re: Do we know why Lucene's HNSW may be slower than other HNSW implementations?

Chris Hegarty Mon, 07 Jul 2025 13:43:20 -0700

Thanks Trevor, this is helpful.  It also reminded me to reply here too on the 
off-heap scoring.


Scoring off-heap, and thus avoiding a copy, gains approx 2x in the vector 
comparisons - this is quite substantial and maybe align with what Trevor sees? 
However, because of a Hotspot bug, using MemorySegment is not yet an option for 
scoring float32’s off-heap [1]. We’ll get the Hotspot bug fixed, but in the 
mean time to help evaluate the potential gain I just wrote native float32 
vector comparison operations to see how they perform in Elasticsearch [2].

-Chris.

[1] https://mail.openjdk.org/pipermail/panama-dev/2025-June/021070.html
[2] https://github.com/elastic/elasticsearch/pull/130541

> On 7 Jul 2025, at 21:01, Trevor McCulloch 
> <[email protected]> wrote:
> 
> For comparison I put together a codec that is a copy of Lucene99 but with 
> most of KnnVectorsReader.search() implemented in native code and called via 
> FFM as a way of examining overhead from the JVM. I didn't have the same data 
> set, but on 1M 1536d float vectors with default lucene99 settings it was 
> about 10% faster in native code -- not nothing, but not very much considering 
> the additional complexity of using native code. I was able to avoid copying 
> data from the mmap backing store in order to perform distance comparisons 
> which was probably a significant chunk of the win. The vast majority of CPU 
> time (~80%) is spent doing vector comparison and something like 95-96% of 
> that CPU time is spent scoring in L0. Decoding edges from the graph is ~1.5%. 
> I think so long as vector scoring code is competitive and Lucene is doing a 
> similar number of comparisons the margin should be pretty close.
> 
> I looked through their open source implementation and did not see anything 
> that led me to believe that their HNSW implementation is substantially 
> different in an algorithmic sense. They did have some logic around choosing 
> different representations of the visited set depending on the expected number 
> of nodes to visit (choosing between ~FixedBitSet and a hash set).
> 
> On Mon, Jun 23, 2025 at 6:10 AM Benjamin Trent <[email protected]> wrote:
> To my knowledge, FAISS isn't utilizing hand-rolled SIMD calculations. Do we 
> know if it was compiled with `--ffast-math`?
> 
> Vespa does utilize SIMD optimizations for vector comparisons. 
> 
> Some more ways I think Lucene is slower (though, I am not sure the 2x is 
> fully explained):
> 
>  - Reading floats onto heap float[] instead of accessing Memory Segments 
> directly when scoring
>  - We store the graph in a unique way that requires a decoding step when 
> exploring a new candidate, reading in vints and doing a binary search. I 
> think all other hnsw impls do flat arrays of int/long values.
>  - We always use SparseBitSet, which for smaller indices <1M can have a 
> noticeable impact on performance. I have seen this in my own benchmarking 
> (switching to fixedbitset measurably improved query times on smaller data 
> sets)
> 
> Both of these are fairly "cheap". Which might explain the FAISS 10% 
> difference. However, I am not sure they fully explain the 2x difference with 
> vespa.
> 
> On Thu, Jun 19, 2025 at 3:37 PM Adrien Grand <[email protected]> wrote:
> Thanks Mike, this is useful information. Then I'll try to reproduce this 
> benchmark to better understand what is happening.
> 
> On Thu, Jun 19, 2025 at 8:16 PM Michael Sokolov <[email protected]> wrote:
> We've recently been comparing Lucene's HNSW w/FAISS' and there is not
> a 2x difference there. FAISS does seem to be around 10-15% faster I
> think?  The 2x difference is roughly what I was seeing in comparisons
> w/hnswlib prior to the dot-product improvements we made in Lucene.
> 
> On Thu, Jun 19, 2025 at 2:12 PM Adrien Grand <[email protected]> wrote:
> >
> > Chris,
> >
> > FWIW I was looking at luceneknn 
> > (https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/luceneknn/Dockerfile#L15)
> >  which is on 9.7, though I don't know if it enabled the incubating vector 
> > API at runtime?
> >
> > I hope that mentioning ANN benchmarks did not add noise to this thread, I 
> > was mostly looking at whether I could find another benchmark that suggests 
> > that Lucene is significantly slower in similar conditions. Does it align 
> > with other people's experience that Lucene is 2x slower or more compared 
> > with other good HNSW implementations?
> >
> > Adrien
> >
> > Le jeu. 19 juin 2025, 18:44, Chris Hegarty 
> > <[email protected]> a écrit :
> >>
> >> Hi Adrien,
> >>
> >> > Even though it uses Elasticsearch to run the benchmark, it really 
> >> > benchmarks Lucene functionality,
> >>
> >> Agreed.
> >>
> >> > This seems consistent with results from 
> >> > https://ann-benchmarks.com/index.html though I don't know if the cause 
> >> > of the performance difference is the same or not.
> >>
> >> On ann-benchmarks specifically. Unless I’m looking in the wrong place, 
> >> then they’re using Elasticsearch 8.7.0 [1], which predates our usage of 
> >> the Panama Vector API for vector search. We added support for that in 
> >> Lucene 9.7.0 -> Elasticsearch 8.9.0.  So those benchmarks are wildly out 
> >> of date, no ?
> >>
> >> -Chris.
> >>
> >> [1] 
> >> https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/elasticsearch/Dockerfile#L2
> >>
> >>
> >> > On 19 Jun 2025, at 16:39, Adrien Grand <[email protected]> wrote:
> >> >
> >> > Hello all,
> >> >
> >> > I have been looking at this benchmark against Vespa recently: 
> >> > https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/. 
> >> > (The report is behind an annoying email wall, but I'm copying relevant 
> >> > data below, so hopefully you don't need to download the report.) Even 
> >> > though it uses Elasticsearch to run the benchmark, it really benchmarks 
> >> > Lucene functionality, I don't believe that Elasticsearch does anything 
> >> > that meaningfully alters the results that you would get if you were to 
> >> > run Lucene directly.
> >> >
> >> > The benchmark seems designed to highlight the benefits of Vespa's 
> >> > realtime design, that's fair game I guess. But it also runs some queries 
> >> > in read-only scenarios when I was expecting Lucene to perform better.
> >> >
> >> > One thing that got me curious is that it reports about 2x worse latency 
> >> > and throughput for pure unfiltered vector search on a force-merged index 
> >> > (so single segment/graph). Does anybody know why Lucene's HNSW may 
> >> > perform slower than Vespa's HNSW? This seems consistent with results 
> >> > from https://ann-benchmarks.com/index.html though I don't know if the 
> >> > cause of the performance difference is the same or not.
> >> >
> >> > For reference, here are details that apply to both Lucene and Vespa's 
> >> > vector search:
> >> >  - HNSW,
> >> >  - float32 vectors, no quantization,
> >> >  - embeddings generated using  Snowflake's Arctic-embed-xs model
> >> >  - 1M docs
> >> >  - 384 dimensions,
> >> >  - dot product,
> >> >  - m = 16,
> >> >  - max connections = 200,
> >> >  - search for top 10 hits,
> >> >  - no filter,
> >> >  - single client, so no search concurrency,
> >> >  - purple column is force-merged, so single segment/graph like Vespa.
> >> >
> >> > I never seriously looked at Lucene's vector search performance, so I'm 
> >> > very happy to be educated if I'm making naive assumptions!
> >> >
> >> > Somewhat related, is this the reason why I'm seeing many threads around 
> >> > bringing 3rd party implementations into Lucene, including ones that are 
> >> > very similar to Lucene on paper? To speed up vector search?
> >> >
> >> > --
> >> > Adrien
> >> > <vespa-vs-es-screenshot.png>
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [email protected]
> >> > For additional commands, e-mail: [email protected]
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> 
> -- 
> Adrien


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Do we know why Lucene's HNSW may be slower than other HNSW implementations?

Reply via email to