Jake?

On Tue, Jan 18, 2011 at 11:50 PM, Sean Owen (JIRA) <[email protected]> wrote:

>
>    [
> https://issues.apache.org/jira/browse/MAHOUT-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983592#action_12983592]
>
> Sean Owen commented on MAHOUT-369:
> ----------------------------------
>
> This one's also been on the shelf for about 4 months. Is it ready to go, or
> should it be archived?
>
> > Issues with DistributedLanczosSolver output
> > -------------------------------------------
> >
> >                 Key: MAHOUT-369
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-369
> >             Project: Mahout
> >          Issue Type: Bug
> >          Components: Math
> >    Affects Versions: 0.3, 0.4
> >            Reporter: Danny Leshem
> >            Assignee: Jake Mannix
> >             Fix For: 0.5
> >
> >         Attachments: MAHOUT-369.patch
> >
> >
> > DistributedLanczosSolver (line 99) claims to persist
> eigenVectors.numRows() vectors.
> > {code}
> >     log.info("Persisting " + eigenVectors.numRows() + " eigenVectors and
> eigenValues to: " + outputPath);
> > {code}
> > However, a few lines later (line 106) we have
> > {code}
> >     for(int i=0; i<eigenVectors.numRows() - 1; i++) {
> >         ...
> >     }
> > {code}
> > which only persists eigenVectors.numRows()-1 vectors.
> > Seems like the most significant eigenvector (i.e. the one with the
> largest eigenvalue) is omitted... off by one bug?
> > Also, I think it would be better if the eigenvectors are persisted in
> *reverse* order, meaning the most significant vector is marked "0", the 2nd
> most significant is marked "1", etc.
> > This, for two reasons:
> > 1) When performing another PCA on the same corpus (say, with more
> principal componenets), corresponding eigenvalues can be easily matched and
> compared.
> > 2) Makes it easier to discard the least significant principal components,
> which for Lanczos decomposition are usually garbage.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Reply via email to