There isn't a good way to guess eigenvalues. But the basic decomposer can see progressively more eigenvalues as it goes and should be able to know when to stop.
I can't speak to where in the code you would stick that, but it should be reasonably easy to find. On Thu, Oct 21, 2010 at 5:33 PM, Shannon Quinn (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923711#action_12923711] > > Shannon Quinn commented on MAHOUT-516: > -------------------------------------- > > For the time being, I'm just going to go with a -k type flag for specifying > the degree of eigendecomposition. But here are my thoughts on a more > permanent solution: > > In reading up a little further on the low-rank approximations of the > eigencuts paper, it appears that, at least for images, the eigenvalues > follow a linear decrease from 1, i.e. each corresponding eigenvalue is <= > the previous according to some approximately linear function. In perturbing > the flow of probability in the underlying markov transition graph (in order > to determine where the clusters are), any eigenvalue/eigenvector pairs that > fall under a certain threshold (specified by a combination of epsilon and > beta, which are command-line arguments) are ignored. Thus, since the > eigenvalues are monotonically decreasing, in theory we'd only need to find > which eigenvalue falls beneath the threshold and perform a full > decomposition up to that point. > > There's an obvious implementation problem there: we can't really know what > that minimum degree is without performing a full decomposition in the first > place. Is there a way around this? Do we have an efficient way of > calculating, or perhaps approximating, eigenvalues without computing > corresponding eigenvectors or otherwise performing a full decomposition? > Maybe we could even do this probabilistically by "sampling" from the space > of eigenvalues to make a guess on what rank we want? Just throwing ideas out > here until the experts respond :) > > > Eigencuts produces unexpected results > > ------------------------------------- > > > > Key: MAHOUT-516 > > URL: https://issues.apache.org/jira/browse/MAHOUT-516 > > Project: Mahout > > Issue Type: Bug > > Components: Clustering > > Affects Versions: 0.4 > > Reporter: Jeff Eastman > > Fix For: 0.5 > > > > Attachments: jeastman.vcf > > > > > > Shannon reports he suspects a logic error in Eigencuts since it evidently > does not produce exactly the expected results. It passes all current unit > tests so we need to characterize the results differences and produce a test > for it. Marking for 0.5 for now though we will fix it as soon as possible. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > >
