Re: Clustering from DB

Ted Dunning Mon, 27 Jul 2009 11:22:17 -0700

Well, it does matter to some degree since picking random vectors tends to
give you dense vectors whereas text gives you very sparse vectors.

Another issue is that raw text without a kill list gives you sparse vectors
with common words always non-zero.

Different patterns of sparsity can cause radically different time complexity
for the clustering.

On Mon, Jul 27, 2009 at 11:05 AM, nfantone <[email protected]> wrote:

> > I'm not sure why testing with Random vectors would be all that useful
> other than it shows it > runs.  I wouldn't expect anything useful to come
> out of it, though.
>
> Well... my point was that it really doesn't matter how you create the
> Vectors: it's the size of the final file/s that's relevant. Then
> again, that IS the problem behind all: it runs - and that's about all
> it does, for now.
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Clustering from DB

Reply via email to