Yes. Projection onto the sphere helps. Doing this to a sequence file full of vectors should be pretty easy since you just have to do v.normalize(2).
But no. The fundamental problems with eigenspokes has a lot to do with small counts and excessive weighting of coincidence. To fix that you really need to go to a probabilistic project method like LDA. On Thu, Dec 9, 2010 at 1:54 PM, Dmitriy Lyubimov <[email protected]> wrote: > Hi everyone. > > i was thinking about eigenspokes problem. Actually briefly looked thru one > paper about it. > > > We basically said cluster detection doesn't work well on them. But it would > seem to me that's just a matter of geometrical convenience. if we convert U > stuff into hyperspherical vectors (and exclude the second norm from it), > shouldn't that representation actually have very nice centroids? > > Or i am missing something fundamental here? > > But if that solves the problem, then it looks like we could have a > preprocessor for clustering algorithms converting SVD output into > hyperspherical vectors. so this basically would allow to run clustering > after dimensionality reduction (and there's another reason why i wanted to > do that but that's another discussion's subject). > > Thanks. > -Dmitriy >
