[ 
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095146#comment-13095146
 ] 

Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------

Ok first implementation with QR solvers is ready, added -q parameter. (all in 
git remote [email protected]:dlyubimov/mahout-commits branch MAHOUT-796)

Did not have time to test distributed version and larger inputs. on a toy input 
1000x2000, k=3, p=10 (optimal 10,4,1,(0.1) ): 

q=0: 
--SSVD solver singular values:
svs: 9.998472  3.993542  0.990456  0.100000  0.100000  0.100000  0.100000  
0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  

q=1: (+2 more sequential steps):
--SSVD solver singular values:
svs: 10.000000  4.000000  0.999999  0.100000  0.100000  0.100000  0.100000  
0.100000  0.100000  0.100000  0.100000  0.100000  0.100000  

So, much better (although much slower as well). 
I of course understand that each run exhibit noise, so to prove it works better 
consistently i need to run more than just 2 attempts. But that's encouraging. 
it worked (and actually at first attempt)!

I tried some optimization to handle sparse cases a little better as well, i 
guess it taxes densier cases a little bit.

So this will be put on hold until i add Cholesky option and then i will have to 
return to this issue to enable the same schema but Y'Y+ Cholesky path.

I refactored QR steps into standalone OutputCollector implementations so that 
they can now be more easily be pipelined inside mappers and reducers so code is 
much more readable now. 

So after a few tests and final fixes i think it is a commit worthy but it has 
dependency on Ted's refactoring MAHOUT-790 pushed to trunk.



> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
>                 Key: MAHOUT-796
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-796
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>              Labels: SSVD
>             Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power 
> iterations and their significant effect on accuracy of smaller eigenvalues 
> and noise attenuation. 
> Essentially, we would like to introduce yet another job parameter, q, that 
> governs amount of optional power iterations. The suggestion how to modify the 
> algorithm is outlined here : 
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper 
> in the sense that additional orthogonalization performed after each 
> iteration. Nathan points out that that improves errors in smaller eigenvalues 
> a lot (If i interpret it right). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to