[
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096264#comment-13096264
]
Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------
I am now convinced that all the inefficiency and cpu-bound behavior comes from
the multiplications Q'B, AB'. Indeed, I generate outer products which i then
output row by row and sum up in combiners and reducers. this creates tremendous
pressure in terms of keys for spill and reduce sorts.
I think tremendous improvements would result if we output outer products of B'
or AB' in vertical thin but long blocks. This seem like a trivial idea but I
was preoccupied by 'proof of concept' issues whereas matrix multiplication has
been seen as algorithmically trivial.
the command line i used:
bin/mahout ssvd --input /tmp/DRM/* --output /tmp/SSVD1 --tempDir /tmp/SSVD-tmp
-k 10 -p 3 -q 1 -r 10000 --reduceTasks 3 -Dmapred.child.java.opts='-Xmx500m'
note also that I ran on CDH3u0 without any Mahout recompilation with CDH3
binaries and it seems to work out of the box.
> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
> Key: MAHOUT-796
> URL: https://issues.apache.org/jira/browse/MAHOUT-796
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Labels: SSVD
> Fix For: 0.6
>
> Attachments: MAHOUT-796-2.patch, MAHOUT-796.patch
>
>
> Nathan Halko contacted me and pointed out importance of availability of power
> iterations and their significant effect on accuracy of smaller eigenvalues
> and noise attenuation.
> Essentially, we would like to introduce yet another job parameter, q, that
> governs amount of optional power iterations. The suggestion how to modify the
> algorithm is outlined here :
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper
> in the sense that additional orthogonalization performed after each
> iteration. Nathan points out that that improves errors in smaller eigenvalues
> a lot (If i interpret it right).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira