[ 
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096598#comment-13096598
 ] 

Dmitriy Lyubimov commented on MAHOUT-796:
-----------------------------------------

also changed CLI a little bit: 

* made A block height optional with default value of 10,000 which should be 
fine with most inputs on 64mb splits, ~200 eigen values and -Xmx500m in child 
processes. 

* added -oh outer product  sparse row-wise block cardinality used by 'big' 
multiplications Q'A and AB' with default value of 10,000. It may need increases 
with very sparse inputs in order to be more efficient for spill sorts (map-side 
combiners), but it would be always equally efficent in reduce-side sorts which 
was so far the longest running stuff in all there is (blocked multiplications 
are still most expensive; but they are closing in on QR expenses which does not 
use sorts). 



> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
>                 Key: MAHOUT-796
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-796
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>              Labels: SSVD
>             Fix For: 0.6
>
>         Attachments: MAHOUT-796-2.patch, MAHOUT-796-3.patch, 
> MAHOUT-796-4.patch, MAHOUT-796.patch
>
>
> Nathan Halko contacted me and pointed out importance of availability of power 
> iterations and their significant effect on accuracy of smaller eigenvalues 
> and noise attenuation. 
> Essentially, we would like to introduce yet another job parameter, q, that 
> governs amount of optional power iterations. The suggestion how to modify the 
> algorithm is outlined here : 
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper 
> in the sense that additional orthogonalization performed after each 
> iteration. Nathan points out that that improves errors in smaller eigenvalues 
> a lot (If i interpret it right). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to