The lower the condition number (or low signal to noise) the harder it is to
extract the top k singular vectors because in a sense they are not that much
more important than the other n-k.  We see pollution from the smaller n-k
singular directions and that degrades our approximation of the top k space.
 Power iterations (just a few) are extremely important to amplify the gap
between important directions and the unimportant directions.  Instead of
sampling matrix A, we sample matrix (AA*)^qA which has the same singular
vectors but an exaggerated spectrum

   sigma^{2q+1}

In infinite precision there would be no need to orthogonalize between
iterations, only at the last step.  However, in finite precision, the small
singular values can fall below machine precision when taken to the 2q+1st
power and we won't be able to accurately recover them.  It also prevents
overflow if your matrix has a very large sig_max.  It is mostly a precaution
to keep from loosing information and for most cases could probably be
skipped or done only intermittently.  If orthogonalization is a bottleneck
we could consider not doing it.


> > Modified power iterations in existing SSVD code
> > -----------------------------------------------
> >
> >                 Key: MAHOUT-796
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-796
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Math
> >    Affects Versions: 0.5
> >            Reporter: Dmitriy Lyubimov
> >            Assignee: Dmitriy Lyubimov
> >              Labels: SSVD
> >             Fix For: 0.6
> >
> >
>

Reply via email to