[ 
https://issues.apache.org/jira/browse/MAHOUT-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923711#action_12923711
 ] 

Shannon Quinn commented on MAHOUT-516:
--------------------------------------

For the time being, I'm just going to go with a -k type flag for specifying the 
degree of eigendecomposition. But here are my thoughts on a more permanent 
solution:

In reading up a little further on the low-rank approximations of the eigencuts 
paper, it appears that, at least for images, the eigenvalues follow a linear 
decrease from 1, i.e. each corresponding eigenvalue is <= the previous 
according to some approximately linear function. In perturbing the flow of 
probability in the underlying markov transition graph (in order to determine 
where the clusters are), any eigenvalue/eigenvector pairs that fall under a 
certain threshold (specified by a combination of epsilon and beta, which are 
command-line arguments) are ignored. Thus, since the eigenvalues are 
monotonically decreasing, in theory we'd only need to find which eigenvalue 
falls beneath the threshold and perform a full decomposition up to that point.

There's an obvious implementation problem there: we can't really know what that 
minimum degree is without performing a full decomposition in the first place. 
Is there a way around this? Do we have an efficient way of calculating, or 
perhaps approximating, eigenvalues without computing corresponding eigenvectors 
or otherwise performing a full decomposition? Maybe we could even do this 
probabilistically by "sampling" from the space of eigenvalues to make a guess 
on what rank we want? Just throwing ideas out here until the experts respond :)

> Eigencuts produces unexpected results
> -------------------------------------
>
>                 Key: MAHOUT-516
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-516
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Jeff Eastman
>             Fix For: 0.5
>
>         Attachments: jeastman.vcf
>
>
> Shannon reports he suspects a logic error in Eigencuts since it evidently 
> does not produce exactly the expected results. It passes all current unit 
> tests so we need to characterize the results differences and produce a test 
> for it. Marking for 0.5 for now though we will fix it as soon as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to