Hi all,

Thanks once more for everyone's help so far, it's been extremely fruitful. I'm about 98% of the way finished with my first sprint, but unfortunately there is a single error on my second-to-last line of code.

Right after performing an eigen-decomposition using the DistributedLanczosSolver, I feed the outputs directly into the KMeans utility, RandomSeedGenerator, in order to create random cluster centroids for a given K. Unfortunately, during that buildRandom() method call, I hit an index out of bounds exception, and it seems to be an off-by-1 problem (for k=3, the arrays generated are only of length 2).

More detail to be found here: http://spectrallyclustered.wordpress.com/2010/06/18/sprint-1-so-very-close/

I think part of the problem is due to a lack of understanding of the LanczosSolver process. I do know that the eigenvectors are returned as rows in a matrix, in which case the data points I need to feed to KMeans are the columns. How does the desiredRank parameter fit in when it's returning a row matrix? The rule of thumb I'm using is that # of clusters = # of eigenvectors, is there any way to enforce this heuristic explicitly?

Any insights here would be greatly appreciated; I've posted a patch with my latest code on JIRA. Thanks so much!

Regards,
Shannon

Reply via email to