[
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitriy Lyubimov updated MAHOUT-796:
------------------------------------
Attachment: MAHOUT-796.patch
Patch. Local solver tests pass. I also tested multiple splits on sufficiently
larger inputs and sparse inputs with local MR.
I still need to test with yet bigger file with multiple reducers since local MR
does not support multiple reducers.
What i noticed is that with just one additional power iteration with
orthogonalization there's practically no need to run any oversampling (p). So
yes power iteration runs more steps but runtime can be reduced significantly
just because you don't need as wide projection anymore. small values are pretty
good without much oversampling.
Amazing.
> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
> Key: MAHOUT-796
> URL: https://issues.apache.org/jira/browse/MAHOUT-796
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Labels: SSVD
> Fix For: 0.6
>
> Attachments: MAHOUT-796.patch
>
>
> Nathan Halko contacted me and pointed out importance of availability of power
> iterations and their significant effect on accuracy of smaller eigenvalues
> and noise attenuation.
> Essentially, we would like to introduce yet another job parameter, q, that
> governs amount of optional power iterations. The suggestion how to modify the
> algorithm is outlined here :
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper
> in the sense that additional orthogonalization performed after each
> iteration. Nathan points out that that improves errors in smaller eigenvalues
> a lot (If i interpret it right).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira