[
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095146#comment-13095146
]
Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------
Ok first implementation with QR solvers is ready, added -q parameter. (all in
git remote [email protected]:dlyubimov/mahout-commits branch MAHOUT-796)
Did not have time to test distributed version and larger inputs. on a toy input
1000x2000, k=3, p=10 (optimal 10,4,1,(0.1) ):
q=0:
--SSVD solver singular values:
svs: 9.998472 3.993542 0.990456 0.100000 0.100000 0.100000 0.100000
0.100000 0.100000 0.100000 0.100000 0.100000 0.100000
q=1: (+2 more sequential steps):
--SSVD solver singular values:
svs: 10.000000 4.000000 0.999999 0.100000 0.100000 0.100000 0.100000
0.100000 0.100000 0.100000 0.100000 0.100000 0.100000
So, much better (although much slower as well).
I of course understand that each run exhibit noise, so to prove it works better
consistently i need to run more than just 2 attempts. But that's encouraging.
it worked (and actually at first attempt)!
I tried some optimization to handle sparse cases a little better as well, i
guess it taxes densier cases a little bit.
So this will be put on hold until i add Cholesky option and then i will have to
return to this issue to enable the same schema but Y'Y+ Cholesky path.
I refactored QR steps into standalone OutputCollector implementations so that
they can now be more easily be pipelined inside mappers and reducers so code is
much more readable now.
So after a few tests and final fixes i think it is a commit worthy but it has
dependency on Ted's refactoring MAHOUT-790 pushed to trunk.
> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
> Key: MAHOUT-796
> URL: https://issues.apache.org/jira/browse/MAHOUT-796
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Labels: SSVD
> Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power
> iterations and their significant effect on accuracy of smaller eigenvalues
> and noise attenuation.
> Essentially, we would like to introduce yet another job parameter, q, that
> governs amount of optional power iterations. The suggestion how to modify the
> algorithm is outlined here :
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper
> in the sense that additional orthogonalization performed after each
> iteration. Nathan points out that that improves errors in smaller eigenvalues
> a lot (If i interpret it right).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira