[
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091538#comment-13091538
]
Ted Dunning commented on MAHOUT-796:
------------------------------------
For the in-memory implementations, I think that this is a non-issue. Power
iteration should simply be implemented. In that case, the original form using
Y = (A'A)^q A \Omega seems fine and I don't yet quite see how the iteration
that Dmitriy proposes will get the right result. Whichever method is used, it
is a good thing to do.
The problems that I see are for the out-of-core problems. There, computing A'A
can often give pathologically bad results if the sparse pattern is highly
skewed. That approach also leads to significant fill-in which is not a good
thing. On the hand, multiplying A times anything too large to store in memory
such as B typically is may be horribly bad as well.
The orthogonalization is no big deal since it requires only a single pass
through the data to accumulate the small matrix required for the Cholesky trick.
> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
> Key: MAHOUT-796
> URL: https://issues.apache.org/jira/browse/MAHOUT-796
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Labels: SSVD
> Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power
> iterations and their significant effect on accuracy of smaller eigenvalues
> and noise attenuation.
> Essentially, we would like to introduce yet another job parameter, q, that
> governs amount of optional power iterations. The suggestion how to modify the
> algorithm is outlined here :
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper
> in the sense that additional orthogonalization performed after each
> iteration. Nathan points out that that improves errors in smaller eigenvalues
> a lot (If i interpret it right).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira