It would have to be in the LanczosSolver itself, though - seems like
kind of a special case to be modifying the black box?
On 10/22/2010 2:10 AM, Ted Dunning wrote:
There isn't a good way to guess eigenvalues.
But the basic decomposer can see progressively more eigenvalues as it goes
and should be able to know when to stop.
I can't speak to where in the code you would stick that, but it should be
reasonably easy to find.
On Thu, Oct 21, 2010 at 5:33 PM, Shannon Quinn (JIRA)<[email protected]>wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923711#action_12923711]
Shannon Quinn commented on MAHOUT-516:
--------------------------------------
For the time being, I'm just going to go with a -k type flag for specifying
the degree of eigendecomposition. But here are my thoughts on a more
permanent solution:
In reading up a little further on the low-rank approximations of the
eigencuts paper, it appears that, at least for images, the eigenvalues
follow a linear decrease from 1, i.e. each corresponding eigenvalue is<=
the previous according to some approximately linear function. In perturbing
the flow of probability in the underlying markov transition graph (in order
to determine where the clusters are), any eigenvalue/eigenvector pairs that
fall under a certain threshold (specified by a combination of epsilon and
beta, which are command-line arguments) are ignored. Thus, since the
eigenvalues are monotonically decreasing, in theory we'd only need to find
which eigenvalue falls beneath the threshold and perform a full
decomposition up to that point.
There's an obvious implementation problem there: we can't really know what
that minimum degree is without performing a full decomposition in the first
place. Is there a way around this? Do we have an efficient way of
calculating, or perhaps approximating, eigenvalues without computing
corresponding eigenvectors or otherwise performing a full decomposition?
Maybe we could even do this probabilistically by "sampling" from the space
of eigenvalues to make a guess on what rank we want? Just throwing ideas out
here until the experts respond :)
Eigencuts produces unexpected results
-------------------------------------
Key: MAHOUT-516
URL: https://issues.apache.org/jira/browse/MAHOUT-516
Project: Mahout
Issue Type: Bug
Components: Clustering
Affects Versions: 0.4
Reporter: Jeff Eastman
Fix For: 0.5
Attachments: jeastman.vcf
Shannon reports he suspects a logic error in Eigencuts since it evidently
does not produce exactly the expected results. It passes all current unit
tests so we need to characterize the results differences and produce a test
for it. Marking for 0.5 for now though we will fix it as soon as possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.