On Fri, Apr 7, 2023 at 5:13 PM Benjamin Trent <ben.w.tr...@gmail.com> wrote:
>
> From all I have seen when hooking up JFR when indexing a medium number of 
> vectors(1M +), almost all the time is spent simply comparing the vectors 
> (e.g. dot_product).
>
> This indicates to me that another algorithm won't really help index build 
> time tremendously. Unless others do dramatically fewer vector comparisons 
> (from what I can tell, this is at least not true for DiskAnn, unless some 
> fancy footwork is done when building the PQ codebook).
>
> I would also say comparing vector index build time to indexing terms are 
> apples and oranges. Yeah, they both live in Lucene, but the number of 
> calculations required (no matter the data structure used), will be magnitudes 
> greater.
>

I'm not sure, i think this slowness due to massive number of
comparisons is just another side effect of the unscalable algorithm.
It is designed to build an in-memory datastructure and "merge" means
"rebuild". And since we fully rebuild a new one when merging, you get
something like O(n^2) total indexing when you take merges into
account.
Some of the other algorithms... in fact support merging. The DiskANN
paper has like a "chapter" on this.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to