It would have to be in the LanczosSolver itself, though - seems like kind of a special case to be modifying the black box?

On 10/22/2010 2:10 AM, Ted Dunning wrote:
There isn't a good way to guess eigenvalues.

But the basic decomposer can see progressively more eigenvalues as it goes
and should be able to know when to stop.

I can't speak to where in the code you would stick that, but it should be
reasonably easy to find.

On Thu, Oct 21, 2010 at 5:33 PM, Shannon Quinn (JIRA)<[email protected]>wrote:

    [
https://issues.apache.org/jira/browse/MAHOUT-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923711#action_12923711]

Shannon Quinn commented on MAHOUT-516:
--------------------------------------

For the time being, I'm just going to go with a -k type flag for specifying
the degree of eigendecomposition. But here are my thoughts on a more
permanent solution:

In reading up a little further on the low-rank approximations of the
eigencuts paper, it appears that, at least for images, the eigenvalues
follow a linear decrease from 1, i.e. each corresponding eigenvalue is<=
the previous according to some approximately linear function. In perturbing
the flow of probability in the underlying markov transition graph (in order
to determine where the clusters are), any eigenvalue/eigenvector pairs that
fall under a certain threshold (specified by a combination of epsilon and
beta, which are command-line arguments) are ignored. Thus, since the
eigenvalues are monotonically decreasing, in theory we'd only need to find
which eigenvalue falls beneath the threshold and perform a full
decomposition up to that point.

There's an obvious implementation problem there: we can't really know what
that minimum degree is without performing a full decomposition in the first
place. Is there a way around this? Do we have an efficient way of
calculating, or perhaps approximating, eigenvalues without computing
corresponding eigenvectors or otherwise performing a full decomposition?
Maybe we could even do this probabilistically by "sampling" from the space
of eigenvalues to make a guess on what rank we want? Just throwing ideas out
here until the experts respond :)

Eigencuts produces unexpected results
-------------------------------------

                 Key: MAHOUT-516
                 URL: https://issues.apache.org/jira/browse/MAHOUT-516
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.4
            Reporter: Jeff Eastman
             Fix For: 0.5

         Attachments: jeastman.vcf


Shannon reports he suspects a logic error in Eigencuts since it evidently
does not produce exactly the expected results. It passes all current unit
tests so we need to characterize the results differences and produce a test
for it. Marking for 0.5 for now though we will fix it as soon as possible.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Reply via email to