Thanks Trevor, > On 8 Jul 2025, at 16:46, Trevor McCulloch > <trevor.mccull...@mongodb.com.INVALID> wrote: > > Hey Chris, > > I was looking at your microbenchmark results [1] and noticed that dot product > times are lower than a typical L3 cache miss time of ~150ns. This may align > our results: a native SIMD accelerate scorer off heap is a lot faster, but > the performance in a scale workload isn't substantially better because the > improvement gets eaten by memory latency when loading document vectors.
Yeah, I had the same thought. > HNSW's access pattern is random so this feels likely. To that end I got an > additional ~10% improvement in my macro benchmark by inserting prefetching > intrinsics right before scoring [2]. It would certainly be possible to do > this directly using cpu native intrinsics (_mm_prefetch on x86_64 or > _prefetch on aarch64) in your native scorer. In a previous version I was perfecting, but then remove it. I think I need to use a larger dataset for this micro-benchmark. > I'll try to find some time this week to try this further up in the HNSW > traversal loop since we are effectively doing one-to-many scoring against a > list of vertices, it might be faster to loop twice: once to check the visited > set and prefetch anything that will be scored, and then again to score. I > should also stand this up on a linux box as perf counters may be helpful here. Interesting idea! -Chris. > Trevor > > [1] > https://github.com/elastic/elasticsearch/pull/130635#issuecomment-3036314864 > [2] https://github.com/mccullocht/lucene-knn-oxide/compare/main...prefetch > > On Mon, Jul 7, 2025 at 1:43 PM Chris Hegarty > <christopher.hega...@elastic.co.invalid> wrote: > Thanks Trevor, this is helpful. It also reminded me to reply here too on the > off-heap scoring. > > Scoring off-heap, and thus avoiding a copy, gains approx 2x in the vector > comparisons - this is quite substantial and maybe align with what Trevor > sees? However, because of a Hotspot bug, using MemorySegment is not yet an > option for scoring float32’s off-heap [1]. We’ll get the Hotspot bug fixed, > but in the mean time to help evaluate the potential gain I just wrote native > float32 vector comparison operations to see how they perform in Elasticsearch > [2]. > > -Chris. > > [1] https://mail.openjdk.org/pipermail/panama-dev/2025-June/021070.html > [2] https://github.com/elastic/elasticsearch/pull/130541 > > > On 7 Jul 2025, at 21:01, Trevor McCulloch > > <trevor.mccull...@mongodb.com.INVALID> wrote: > > > > For comparison I put together a codec that is a copy of Lucene99 but with > > most of KnnVectorsReader.search() implemented in native code and called via > > FFM as a way of examining overhead from the JVM. I didn't have the same > > data set, but on 1M 1536d float vectors with default lucene99 settings it > > was about 10% faster in native code -- not nothing, but not very much > > considering the additional complexity of using native code. I was able to > > avoid copying data from the mmap backing store in order to perform distance > > comparisons which was probably a significant chunk of the win. The vast > > majority of CPU time (~80%) is spent doing vector comparison and something > > like 95-96% of that CPU time is spent scoring in L0. Decoding edges from > > the graph is ~1.5%. I think so long as vector scoring code is competitive > > and Lucene is doing a similar number of comparisons the margin should be > > pretty close. > > > > I looked through their open source implementation and did not see anything > > that led me to believe that their HNSW implementation is substantially > > different in an algorithmic sense. They did have some logic around choosing > > different representations of the visited set depending on the expected > > number of nodes to visit (choosing between ~FixedBitSet and a hash set). > > > > On Mon, Jun 23, 2025 at 6:10 AM Benjamin Trent <ben.w.tr...@gmail.com> > > wrote: > > To my knowledge, FAISS isn't utilizing hand-rolled SIMD calculations. Do we > > know if it was compiled with `--ffast-math`? > > > > Vespa does utilize SIMD optimizations for vector comparisons. > > > > Some more ways I think Lucene is slower (though, I am not sure the 2x is > > fully explained): > > > > - Reading floats onto heap float[] instead of accessing Memory Segments > > directly when scoring > > - We store the graph in a unique way that requires a decoding step when > > exploring a new candidate, reading in vints and doing a binary search. I > > think all other hnsw impls do flat arrays of int/long values. > > - We always use SparseBitSet, which for smaller indices <1M can have a > > noticeable impact on performance. I have seen this in my own benchmarking > > (switching to fixedbitset measurably improved query times on smaller data > > sets) > > > > Both of these are fairly "cheap". Which might explain the FAISS 10% > > difference. However, I am not sure they fully explain the 2x difference > > with vespa. > > > > On Thu, Jun 19, 2025 at 3:37 PM Adrien Grand <jpou...@gmail.com> wrote: > > Thanks Mike, this is useful information. Then I'll try to reproduce this > > benchmark to better understand what is happening. > > > > On Thu, Jun 19, 2025 at 8:16 PM Michael Sokolov <msoko...@gmail.com> wrote: > > We've recently been comparing Lucene's HNSW w/FAISS' and there is not > > a 2x difference there. FAISS does seem to be around 10-15% faster I > > think? The 2x difference is roughly what I was seeing in comparisons > > w/hnswlib prior to the dot-product improvements we made in Lucene. > > > > On Thu, Jun 19, 2025 at 2:12 PM Adrien Grand <jpou...@gmail.com> wrote: > > > > > > Chris, > > > > > > FWIW I was looking at luceneknn > > > (https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/luceneknn/Dockerfile#L15) > > > which is on 9.7, though I don't know if it enabled the incubating vector > > > API at runtime? > > > > > > I hope that mentioning ANN benchmarks did not add noise to this thread, I > > > was mostly looking at whether I could find another benchmark that > > > suggests that Lucene is significantly slower in similar conditions. Does > > > it align with other people's experience that Lucene is 2x slower or more > > > compared with other good HNSW implementations? > > > > > > Adrien > > > > > > Le jeu. 19 juin 2025, 18:44, Chris Hegarty > > > <christopher.hega...@elastic.co.invalid> a écrit : > > >> > > >> Hi Adrien, > > >> > > >> > Even though it uses Elasticsearch to run the benchmark, it really > > >> > benchmarks Lucene functionality, > > >> > > >> Agreed. > > >> > > >> > This seems consistent with results from > > >> > https://ann-benchmarks.com/index.html though I don't know if the cause > > >> > of the performance difference is the same or not. > > >> > > >> On ann-benchmarks specifically. Unless I’m looking in the wrong place, > > >> then they’re using Elasticsearch 8.7.0 [1], which predates our usage of > > >> the Panama Vector API for vector search. We added support for that in > > >> Lucene 9.7.0 -> Elasticsearch 8.9.0. So those benchmarks are wildly out > > >> of date, no ? > > >> > > >> -Chris. > > >> > > >> [1] > > >> https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/elasticsearch/Dockerfile#L2 > > >> > > >> > > >> > On 19 Jun 2025, at 16:39, Adrien Grand <jpou...@gmail.com> wrote: > > >> > > > >> > Hello all, > > >> > > > >> > I have been looking at this benchmark against Vespa recently: > > >> > https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/. > > >> > (The report is behind an annoying email wall, but I'm copying relevant > > >> > data below, so hopefully you don't need to download the report.) Even > > >> > though it uses Elasticsearch to run the benchmark, it really > > >> > benchmarks Lucene functionality, I don't believe that Elasticsearch > > >> > does anything that meaningfully alters the results that you would get > > >> > if you were to run Lucene directly. > > >> > > > >> > The benchmark seems designed to highlight the benefits of Vespa's > > >> > realtime design, that's fair game I guess. But it also runs some > > >> > queries in read-only scenarios when I was expecting Lucene to perform > > >> > better. > > >> > > > >> > One thing that got me curious is that it reports about 2x worse > > >> > latency and throughput for pure unfiltered vector search on a > > >> > force-merged index (so single segment/graph). Does anybody know why > > >> > Lucene's HNSW may perform slower than Vespa's HNSW? This seems > > >> > consistent with results from https://ann-benchmarks.com/index.html > > >> > though I don't know if the cause of the performance difference is the > > >> > same or not. > > >> > > > >> > For reference, here are details that apply to both Lucene and Vespa's > > >> > vector search: > > >> > - HNSW, > > >> > - float32 vectors, no quantization, > > >> > - embeddings generated using Snowflake's Arctic-embed-xs model > > >> > - 1M docs > > >> > - 384 dimensions, > > >> > - dot product, > > >> > - m = 16, > > >> > - max connections = 200, > > >> > - search for top 10 hits, > > >> > - no filter, > > >> > - single client, so no search concurrency, > > >> > - purple column is force-merged, so single segment/graph like Vespa. > > >> > > > >> > I never seriously looked at Lucene's vector search performance, so I'm > > >> > very happy to be educated if I'm making naive assumptions! > > >> > > > >> > Somewhat related, is this the reason why I'm seeing many threads > > >> > around bringing 3rd party implementations into Lucene, including ones > > >> > that are very similar to Lucene on paper? To speed up vector search? > > >> > > > >> > -- > > >> > Adrien > > >> > <vespa-vs-es-screenshot.png> > > >> > --------------------------------------------------------------------- > > >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >> > For additional commands, e-mail: dev-h...@lucene.apache.org > > >> > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: dev-h...@lucene.apache.org > > >> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > > > > > -- > > Adrien > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org