Right RAVectorValues is just fronting an array of vectors and it
doesn't have any intermediate storage or other state (like a file
pointer) so it can support many simultaneous callers. Other
implementations of the interface work differently; see
OffHeapByteVectorValues, which is representing vectors in the index
and implemented using I/O calls.

If you shared some context about your interest here, we might be able
to help you better.

On Thu, Apr 20, 2023 at 1:22 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>
> It looks like I misunderstood how the Builder works, and the RAVV provided to 
> the constructor does not need to contain any values up front.  Specifically, 
> Lucene95HnswVectorsWriter.FieldWriter adds vectors incrementally to the RAVV 
> that it gives to the builder as addValue is called.
>
> On Wed, Apr 19, 2023 at 1:37 PM Michael Sokolov <msoko...@gmail.com> wrote:
>>
>> That class is intended for use by the Lucene index writer - it's not
>> designed as a general purpose class for re-use outside that context.
>> And IndexWriter writes documents to disk in bulk.
>>
>> On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>> >
>> > Thanks, Michael!
>> >
>> > Looking at the paper by Malkov and Yashunin, it looks like the algorithm 
>> > allows for building the hnsw graph incrementally.  Why does our 
>> > implementation require specifying all the vectors up front to 
>> > HnswGraphBuilder.create?
>> >
>> > On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov <msoko...@gmail.com> wrote:
>> >>
>> >> These vector values have internal buffers they use to return the vectors. 
>> >> In order to compare two vectors we need to use two independent sources so 
>> >> that one doesn't overwrite this internal state when fetching the second 
>> >> vector.
>> >>
>> >> Sorry I forgot the second question and can't see it on my phone. Brb
>> >>
>> >> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>> >>>
>> >>> HI all, a couple questions on how HNSW works:
>> >>>
>> >>> 1. What is driving the requirement for two copies of the input vectors?  
>> >>> It looks like the RAVV implementations do shallow copies, so the vector 
>> >>> from A is the same that would be returned by B.  What am I missing?
>> >>>
>> >>> 2. What is the intended behavior when adding identical vectors to a 
>> >>> HNSW?  It looks like when I supply 10 identical vectors, they all get 
>> >>> added to the graph, but when I search for the nearest neighbors, I only 
>> >>> get one of them in the result set.
>> >>>
>> >>> --
>> >>> Jonathan Ellis
>> >>> co-founder, http://www.datastax.com
>> >>> @spyced
>> >
>> >
>> >
>> > --
>> > Jonathan Ellis
>> > co-founder, http://www.datastax.com
>> > @spyced
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to