I think the concurrency is across segments or slices (= multiple small
segments)?  I.e. "thread per slice" model, not multiple threads in one
slice.

But your cool PR would fix that limitation!
https://github.com/apache/lucene/pull/12254

Mike McCandless

http://blog.mikemccandless.com


On Sun, Apr 23, 2023 at 6:53 PM Jonathan Ellis <jbel...@gmail.com> wrote:

> Sure, I'm adding HNSW support to Cassandra.  (Lots more detail on the
> dev@cassandra list.)
>
> HnswGraph says "The graph may be searched by multiple threads
> concurrently," but OnHeapHnswGraph has a field cur that gets modified by
> seek, which is called by Searcher.  Bug, or outdated comment?
>
> On Thu, Apr 20, 2023 at 1:45 PM Michael Sokolov <msoko...@gmail.com>
> wrote:
>
>> Right RAVectorValues is just fronting an array of vectors and it
>> doesn't have any intermediate storage or other state (like a file
>> pointer) so it can support many simultaneous callers. Other
>> implementations of the interface work differently; see
>> OffHeapByteVectorValues, which is representing vectors in the index
>> and implemented using I/O calls.
>>
>> If you shared some context about your interest here, we might be able
>> to help you better.
>>
>> On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>> >
>> > It looks like I misunderstood how the Builder works, and the RAVV
>> provided to the constructor does not need to contain any values up front.
>> Specifically, Lucene95HnswVectorsWriter.FieldWriter adds vectors
>> incrementally to the RAVV that it gives to the builder as addValue is
>> called.
>> >
>> > On Wed, Apr 19, 2023 at 1:37 PM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> >>
>> >> That class is intended for use by the Lucene index writer - it's not
>> >> designed as a general purpose class for re-use outside that context.
>> >> And IndexWriter writes documents to disk in bulk.
>> >>
>> >> On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis <jbel...@gmail.com>
>> wrote:
>> >> >
>> >> > Thanks, Michael!
>> >> >
>> >> > Looking at the paper by Malkov and Yashunin, it looks like the
>> algorithm allows for building the hnsw graph incrementally.  Why does our
>> implementation require specifying all the vectors up front to
>> HnswGraphBuilder.create?
>> >> >
>> >> > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov <msoko...@gmail.com>
>> wrote:
>> >> >>
>> >> >> These vector values have internal buffers they use to return the
>> vectors. In order to compare two vectors we need to use two independent
>> sources so that one doesn't overwrite this internal state when fetching the
>> second vector.
>> >> >>
>> >> >> Sorry I forgot the second question and can't see it on my phone. Brb
>> >> >>
>> >> >> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com>
>> wrote:
>> >> >>>
>> >> >>> HI all, a couple questions on how HNSW works:
>> >> >>>
>> >> >>> 1. What is driving the requirement for two copies of the input
>> vectors?  It looks like the RAVV implementations do shallow copies, so the
>> vector from A is the same that would be returned by B.  What am I missing?
>> >> >>>
>> >> >>> 2. What is the intended behavior when adding identical vectors to
>> a HNSW?  It looks like when I supply 10 identical vectors, they all get
>> added to the graph, but when I search for the nearest neighbors, I only get
>> one of them in the result set.
>> >> >>>
>> >> >>> --
>> >> >>> Jonathan Ellis
>> >> >>> co-founder, http://www.datastax.com
>> >> >>> @spyced
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jonathan Ellis
>> >> > co-founder, http://www.datastax.com
>> >> > @spyced
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > co-founder, http://www.datastax.com
>> > @spyced
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced
>

Reply via email to