> We do have a dataset built from Wikipedia in luceneutil. It comes in 100 and 
> 300 dimensional varieties and can easily enough generate large numbers of 
> vector documents from the articles data. To go higher we could concatenate 
> vectors from that and I believe the performance numbers would be plausible.

Apologies - I wasn't clear - I thought of building the 1k or 2k
vectors that would be realistic. Perhaps using glove or perhaps using
some other software but something that would reflect a true 2k
dimensional space accurately with "real" data underneath. I am not
familiar enough with the field to tell whether a simple concatenation
is a good enough simulation - perhaps it is.

I would really prefer to focus on doing this kind of assessment of
feasibility/ limitations rather than arguing back and forth. I did my
experiment a while ago and I can't really tell whether there have been
improvements in the indexing/ merging part - your email contradicts my
experience Mike, so I'm a bit intrigued and would like to revisit it.
But it'd be ideal to work with real vectors rather than a simulation.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to