Hi all,
Thanks once more for everyone's help so far, it's been extremely
fruitful. I'm about 98% of the way finished with my first sprint, but
unfortunately there is a single error on my second-to-last line of code.
Right after performing an eigen-decomposition using the
DistributedLanczosSolver, I feed the outputs directly into the KMeans
utility, RandomSeedGenerator, in order to create random cluster
centroids for a given K. Unfortunately, during that buildRandom() method
call, I hit an index out of bounds exception, and it seems to be an
off-by-1 problem (for k=3, the arrays generated are only of length 2).
More detail to be found here:
http://spectrallyclustered.wordpress.com/2010/06/18/sprint-1-so-very-close/
I think part of the problem is due to a lack of understanding of the
LanczosSolver process. I do know that the eigenvectors are returned as
rows in a matrix, in which case the data points I need to feed to KMeans
are the columns. How does the desiredRank parameter fit in when it's
returning a row matrix? The rule of thumb I'm using is that # of
clusters = # of eigenvectors, is there any way to enforce this heuristic
explicitly?
Any insights here would be greatly appreciated; I've posted a patch with
my latest code on JIRA. Thanks so much!
Regards,
Shannon
- IndexOutOfBoundsException in RandomSeedGenerator.buildRandom... Shannon Quinn
-