I think the concurrency is across segments or slices (= multiple small segments)? I.e. "thread per slice" model, not multiple threads in one slice.
But your cool PR would fix that limitation! https://github.com/apache/lucene/pull/12254 Mike McCandless http://blog.mikemccandless.com On Sun, Apr 23, 2023 at 6:53 PM Jonathan Ellis <jbel...@gmail.com> wrote: > Sure, I'm adding HNSW support to Cassandra. (Lots more detail on the > dev@cassandra list.) > > HnswGraph says "The graph may be searched by multiple threads > concurrently," but OnHeapHnswGraph has a field cur that gets modified by > seek, which is called by Searcher. Bug, or outdated comment? > > On Thu, Apr 20, 2023 at 1:45 PM Michael Sokolov <msoko...@gmail.com> > wrote: > >> Right RAVectorValues is just fronting an array of vectors and it >> doesn't have any intermediate storage or other state (like a file >> pointer) so it can support many simultaneous callers. Other >> implementations of the interface work differently; see >> OffHeapByteVectorValues, which is representing vectors in the index >> and implemented using I/O calls. >> >> If you shared some context about your interest here, we might be able >> to help you better. >> >> On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis <jbel...@gmail.com> wrote: >> > >> > It looks like I misunderstood how the Builder works, and the RAVV >> provided to the constructor does not need to contain any values up front. >> Specifically, Lucene95HnswVectorsWriter.FieldWriter adds vectors >> incrementally to the RAVV that it gives to the builder as addValue is >> called. >> > >> > On Wed, Apr 19, 2023 at 1:37 PM Michael Sokolov <msoko...@gmail.com> >> wrote: >> >> >> >> That class is intended for use by the Lucene index writer - it's not >> >> designed as a general purpose class for re-use outside that context. >> >> And IndexWriter writes documents to disk in bulk. >> >> >> >> On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis <jbel...@gmail.com> >> wrote: >> >> > >> >> > Thanks, Michael! >> >> > >> >> > Looking at the paper by Malkov and Yashunin, it looks like the >> algorithm allows for building the hnsw graph incrementally. Why does our >> implementation require specifying all the vectors up front to >> HnswGraphBuilder.create? >> >> > >> >> > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov <msoko...@gmail.com> >> wrote: >> >> >> >> >> >> These vector values have internal buffers they use to return the >> vectors. In order to compare two vectors we need to use two independent >> sources so that one doesn't overwrite this internal state when fetching the >> second vector. >> >> >> >> >> >> Sorry I forgot the second question and can't see it on my phone. Brb >> >> >> >> >> >> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com> >> wrote: >> >> >>> >> >> >>> HI all, a couple questions on how HNSW works: >> >> >>> >> >> >>> 1. What is driving the requirement for two copies of the input >> vectors? It looks like the RAVV implementations do shallow copies, so the >> vector from A is the same that would be returned by B. What am I missing? >> >> >>> >> >> >>> 2. What is the intended behavior when adding identical vectors to >> a HNSW? It looks like when I supply 10 identical vectors, they all get >> added to the graph, but when I search for the nearest neighbors, I only get >> one of them in the result set. >> >> >>> >> >> >>> -- >> >> >>> Jonathan Ellis >> >> >>> co-founder, http://www.datastax.com >> >> >>> @spyced >> >> > >> >> > >> >> > >> >> > -- >> >> > Jonathan Ellis >> >> > co-founder, http://www.datastax.com >> >> > @spyced >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> > >> > >> > -- >> > Jonathan Ellis >> > co-founder, http://www.datastax.com >> > @spyced >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced >