Hi all,

Hopefully these two questions will be my last, at least until my next sprint... :)

I've run the EigenVerification task, and from what I can tell it modifies the SequenceFiles themselves that contain the results of the LanczosSolver. My first question is fairly straightforward: since I need to do as Jake suggested earlier - set my desiredRank for the LanczosSolver as 1.2-1.5 times what I actually want, then discard the highest-order eigenvectors down to exactly desiredRank - how do I actually perform the discard of the extra rows in the SequenceFiles? I tried making a DistributedRowMatrix out of the results and hard-setting the number of rows, but all the rows written by the LanczosSolver showed up.

Part of this spectral clustering is to use the components of the eigenvectors as proxies for the real data, so after I've performed k-means clustering, I need to be able to read the cluster assignments programmatically, and transfer those assignments back to the original data. I know of the clusterdump tool, but to be honest I'm having trouble interpreting its output, plus I'm unsure of how I would output the cluster assignments from my program. It would seem, for compatibility purposes, that the format of clusterdump would be ideal, but I'm not sure how to do this when I'm proxying the cluster assignments. Any thoughts on this would be wonderful.

Thank you!

Shannon

Reply via email to