[
https://issues.apache.org/jira/browse/MAHOUT-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923711#action_12923711
]
Shannon Quinn commented on MAHOUT-516:
--------------------------------------
For the time being, I'm just going to go with a -k type flag for specifying the
degree of eigendecomposition. But here are my thoughts on a more permanent
solution:
In reading up a little further on the low-rank approximations of the eigencuts
paper, it appears that, at least for images, the eigenvalues follow a linear
decrease from 1, i.e. each corresponding eigenvalue is <= the previous
according to some approximately linear function. In perturbing the flow of
probability in the underlying markov transition graph (in order to determine
where the clusters are), any eigenvalue/eigenvector pairs that fall under a
certain threshold (specified by a combination of epsilon and beta, which are
command-line arguments) are ignored. Thus, since the eigenvalues are
monotonically decreasing, in theory we'd only need to find which eigenvalue
falls beneath the threshold and perform a full decomposition up to that point.
There's an obvious implementation problem there: we can't really know what that
minimum degree is without performing a full decomposition in the first place.
Is there a way around this? Do we have an efficient way of calculating, or
perhaps approximating, eigenvalues without computing corresponding eigenvectors
or otherwise performing a full decomposition? Maybe we could even do this
probabilistically by "sampling" from the space of eigenvalues to make a guess
on what rank we want? Just throwing ideas out here until the experts respond :)
> Eigencuts produces unexpected results
> -------------------------------------
>
> Key: MAHOUT-516
> URL: https://issues.apache.org/jira/browse/MAHOUT-516
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.4
> Reporter: Jeff Eastman
> Fix For: 0.5
>
> Attachments: jeastman.vcf
>
>
> Shannon reports he suspects a logic error in Eigencuts since it evidently
> does not produce exactly the expected results. It passes all current unit
> tests so we need to characterize the results differences and produce a test
> for it. Marking for 0.5 for now though we will fix it as soon as possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.