Re: [jira] Commented: (MAHOUT-516) Eigencuts produces unexpected results

Ted Dunning Fri, 22 Oct 2010 11:23:35 -0700

Sounds more like new requirements requiring new behavior to me.

This is open source.  If you see a way to make the box better, you are free
to do so.


On Fri, Oct 22, 2010 at 5:10 AM, Shannon Quinn <[email protected]> wrote:

> It would have to be in the LanczosSolver itself, though - seems like kind
> of a special case to be modifying the black box?
>
>
> On 10/22/2010 2:10 AM, Ted Dunning wrote:
>
>> There isn't a good way to guess eigenvalues.
>>
>> But the basic decomposer can see progressively more eigenvalues as it goes
>> and should be able to know when to stop.
>>
>> I can't speak to where in the code you would stick that, but it should be
>> reasonably easy to find.
>>
>> On Thu, Oct 21, 2010 at 5:33 PM, Shannon Quinn (JIRA)<[email protected]
>> >wrote:
>>
>>     [
>>>
>>> https://issues.apache.org/jira/browse/MAHOUT-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923711#action_12923711
>>> ]
>>>
>>> Shannon Quinn commented on MAHOUT-516:
>>> --------------------------------------
>>>
>>> For the time being, I'm just going to go with a -k type flag for
>>> specifying
>>> the degree of eigendecomposition. But here are my thoughts on a more
>>> permanent solution:
>>>
>>> In reading up a little further on the low-rank approximations of the
>>> eigencuts paper, it appears that, at least for images, the eigenvalues
>>> follow a linear decrease from 1, i.e. each corresponding eigenvalue is<=
>>> the previous according to some approximately linear function. In
>>> perturbing
>>> the flow of probability in the underlying markov transition graph (in
>>> order
>>> to determine where the clusters are), any eigenvalue/eigenvector pairs
>>> that
>>> fall under a certain threshold (specified by a combination of epsilon and
>>> beta, which are command-line arguments) are ignored. Thus, since the
>>> eigenvalues are monotonically decreasing, in theory we'd only need to
>>> find
>>> which eigenvalue falls beneath the threshold and perform a full
>>> decomposition up to that point.
>>>
>>> There's an obvious implementation problem there: we can't really know
>>> what
>>> that minimum degree is without performing a full decomposition in the
>>> first
>>> place. Is there a way around this? Do we have an efficient way of
>>> calculating, or perhaps approximating, eigenvalues without computing
>>> corresponding eigenvectors or otherwise performing a full decomposition?
>>> Maybe we could even do this probabilistically by "sampling" from the
>>> space
>>> of eigenvalues to make a guess on what rank we want? Just throwing ideas
>>> out
>>> here until the experts respond :)
>>>
>>>  Eigencuts produces unexpected results
>>>> -------------------------------------
>>>>
>>>>                 Key: MAHOUT-516
>>>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-516
>>>>             Project: Mahout
>>>>          Issue Type: Bug
>>>>          Components: Clustering
>>>>    Affects Versions: 0.4
>>>>            Reporter: Jeff Eastman
>>>>             Fix For: 0.5
>>>>
>>>>         Attachments: jeastman.vcf
>>>>
>>>>
>>>> Shannon reports he suspects a logic error in Eigencuts since it
>>>> evidently
>>>>
>>> does not produce exactly the expected results. It passes all current unit
>>> tests so we need to characterize the results differences and produce a
>>> test
>>> for it. Marking for 0.5 for now though we will fix it as soon as
>>> possible.
>>>
>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>>
>

Re: [jira] Commented: (MAHOUT-516) Eigencuts produces unexpected results

Reply via email to