Yeah that probably kills the idea doesn't it... the 'best' centroid is well
defined this way, but, searching for it may be completely unreasonable. I
see why counts doesn't have this problem.

On Sep 1, 2009 7:17 PM, "Ted Dunning" <[email protected]> wrote:

On Tue, Sep 1, 2009 at 9:44 AM, Sean Owen <[email protected]> wrote: >
Centroids are just strings th...
Easy to say that.

Very hard to compute.  And the dimensionality is unbounded so the properties
of the centroid are not nice.  You wind up with centroids that are a large
number of edits away from everything and nearly the same distance from
everything.


> ...

> > Anything else that doesn't map? Haven't thought about it a lot but don't
> yet > see why k-means...
Depends on what you mean by well-behaved.  Mathematically speaking, string
edit measures are moderately well behaved.  Computationally and practically,
however, edit distances are not so nice.

Counts of common n-grams are much nicer since they can be interpreted as
vectors.



--
Ted Dunning, CTO
DeepDyve

Reply via email to