Oh identical vectors. Basically unsupported. If you create a large index filled with identical vectors it leads to pathological behavior. Seems to be a weakness in the algorithm. If you have any idea how to improve that, it would be welcome. But in real world scenarios, it doesn't seem to arise?
On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com> wrote: > HI all, a couple questions on how HNSW works: > > 1. What is driving the requirement for two copies of the input vectors? > It looks like the RAVV implementations do shallow copies, so the vector > from A is the same that would be returned by B. What am I missing? > > 2. What is the intended behavior when adding identical vectors to a HNSW? > It looks like when I supply 10 identical vectors, they all get added to the > graph, but when I search for the nearest neighbors, I only get one of them > in the result set. > > -- > Jonathan Ellis > co-founder, http://www.datastax.com > @spyced >