[ 
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091528#comment-13091528
 ] 

Nathan Halko commented on MAHOUT-796:
-------------------------------------

The lower the condition number (or low signal to noise) the harder it is to 
extract the top k singular vectors because in a sense they are not that much 
more important than the other n-k.  We see pollution from the smaller n-k 
singular directions and that degrades our approximation of the top k space.  
Power iterations (just a few) are extremely important to amplify the gap 
between important directions and the unimportant directions.  Instead of 
sampling matrix A, we sample matrix (AA*)^qA which has the same singular 
vectors but an exaggerated spectrum

   sigma^{2q+1}

In infinite precision there would be no need to orthogonalize between 
iterations, only at the last step.  However, in finite precision, the small 
singular values can fall below machine precision when taken to the 2q+1st power 
and we won't be able to accurately recover them.  It also prevents overflow if 
your matrix has a very large sig_max.  It is mostly a precaution to keep from 
loosing information and for most cases could probably be skipped or done only 
intermittently.  If orthogonalization is a bottleneck we could consider not 
doing it.



> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
>                 Key: MAHOUT-796
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-796
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>              Labels: SSVD
>             Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power 
> iterations and their significant effect on accuracy of smaller eigenvalues 
> and noise attenuation. 
> Essentially, we would like to introduce yet another job parameter, q, that 
> governs amount of optional power iterations. The suggestion how to modify the 
> algorithm is outlined here : 
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper 
> in the sense that additional orthogonalization performed after each 
> iteration. Nathan points out that that improves errors in smaller eigenvalues 
> a lot (If i interpret it right). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to