Well, it does matter to some degree since picking random vectors tends to
give you dense vectors whereas text gives you very sparse vectors.

Another issue is that raw text without a kill list gives you sparse vectors
with common words always non-zero.

Different patterns of sparsity can cause radically different time complexity
for the clustering.

On Mon, Jul 27, 2009 at 11:05 AM, nfantone <[email protected]> wrote:

> > I'm not sure why testing with Random vectors would be all that useful
> other than it shows it > runs.  I wouldn't expect anything useful to come
> out of it, though.
>
> Well... my point was that it really doesn't matter how you create the
> Vectors: it's the size of the final file/s that's relevant. Then
> again, that IS the problem behind all: it runs - and that's about all
> it does, for now.
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to