[
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096598#comment-13096598
]
Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------
also changed CLI a little bit:
* made A block height optional with default value of 10,000 which should be
fine with most inputs on 64mb splits, ~200 eigen values and -Xmx500m in child
processes.
* added -oh outer product sparse row-wise block cardinality used by 'big'
multiplications Q'A and AB' with default value of 10,000. It may need increases
with very sparse inputs in order to be more efficient for spill sorts (map-side
combiners), but it would be always equally efficent in reduce-side sorts which
was so far the longest running stuff in all there is (blocked multiplications
are still most expensive; but they are closing in on QR expenses which does not
use sorts).
> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
> Key: MAHOUT-796
> URL: https://issues.apache.org/jira/browse/MAHOUT-796
> Project: Mahout
> Issue Type: Improvement
> Components: Math
> Affects Versions: 0.5
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Labels: SSVD
> Fix For: 0.6
>
> Attachments: MAHOUT-796-2.patch, MAHOUT-796-3.patch,
> MAHOUT-796-4.patch, MAHOUT-796.patch
>
>
> Nathan Halko contacted me and pointed out importance of availability of power
> iterations and their significant effect on accuracy of smaller eigenvalues
> and noise attenuation.
> Essentially, we would like to introduce yet another job parameter, q, that
> governs amount of optional power iterations. The suggestion how to modify the
> algorithm is outlined here :
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper
> in the sense that additional orthogonalization performed after each
> iteration. Nathan points out that that improves errors in smaller eigenvalues
> a lot (If i interpret it right).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira