Re: [Proposal] Remove max number of dimensions for KNN vectors

Michael Wechner Sun, 09 Apr 2023 12:23:32 -0700

I think for testing the performance and scalability one can also usesynthetic data and it does not have to be real world data in the senseof vectors generated from real world text.

But I think the more people revisit the testing of performance andscalability the better and any help on this would be great!


Thanks

Michael W



Am 09.04.23 um 20:43 schrieb Dawid Weiss:

We do have a dataset built from Wikipedia in luceneutil. It comes in 100 and 
300 dimensional varieties and can easily enough generate large numbers of 
vector documents from the articles data. To go higher we could concatenate 
vectors from that and I believe the performance numbers would be plausible.

Apologies - I wasn't clear - I thought of building the 1k or 2k
vectors that would be realistic. Perhaps using glove or perhaps using
some other software but something that would reflect a true 2k
dimensional space accurately with "real" data underneath. I am not
familiar enough with the field to tell whether a simple concatenation
is a good enough simulation - perhaps it is.

I would really prefer to focus on doing this kind of assessment of
feasibility/ limitations rather than arguing back and forth. I did my
experiment a while ago and I can't really tell whether there have been
improvements in the indexing/ merging part - your email contradicts my
experience Mike, so I'm a bit intrigued and would like to revisit it.
But it'd be ideal to work with real vectors rather than a simulation.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Proposal] Remove max number of dimensions for KNN vectors

Reply via email to