Yes.  Projection onto the sphere helps.  Doing this to a sequence file full
of vectors should be pretty easy since you just have to do v.normalize(2).

But no.  The fundamental problems with eigenspokes has a lot to do with
small counts and excessive weighting of coincidence.  To fix that you really
need to go to a probabilistic project method like LDA.

On Thu, Dec 9, 2010 at 1:54 PM, Dmitriy Lyubimov <[email protected]> wrote:

> Hi everyone.
>
> i was thinking about eigenspokes problem. Actually briefly looked thru one
> paper about it.
>
>
> We basically said cluster detection doesn't work well on them. But it would
> seem to me that's just a matter of geometrical convenience. if we convert U
> stuff into hyperspherical vectors (and exclude the second norm from it),
> shouldn't that representation actually have very nice centroids?
>
> Or i am missing something fundamental here?
>
> But if that solves the problem, then it looks like we could have a
> preprocessor for clustering algorithms converting SVD output into
> hyperspherical vectors. so this basically would allow to run clustering
> after dimensionality reduction (and there's another reason why i wanted to
> do that but that's another discussion's subject).
>
> Thanks.
> -Dmitriy
>

Reply via email to