[ 
https://issues.apache.org/jira/browse/MAHOUT-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094041#comment-13094041
 ] 

Ted Dunning commented on MAHOUT-796:
------------------------------------

{quote}
At this point it seems that the best strategy is just to preload entire A block 
into memory as a (sparse) matrix and open B' stream as a side file and hope it 
is not going to generate too much flood i/o. I don't know a workaround for it 
anyway since whatever blocking scheme is used, we need cartesian products from 
both matrix inputs and that will cause i/o and i don't think there's any clever 
collocation trick to be had there
{quote}

Presumably there could be a role for the distributed cache here to make the I/O 
load more manageable.

This is just the sort of thing that the MapR ability to control placement, 
mirroring and to read via NFS comes in really, really handy.  Can't really 
assume that for Mahout, though.


> Modified power iterations in existing SSVD code
> -----------------------------------------------
>
>                 Key: MAHOUT-796
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-796
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Math
>    Affects Versions: 0.5
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>              Labels: SSVD
>             Fix For: 0.6
>
>
> Nathan Halko contacted me and pointed out importance of availability of power 
> iterations and their significant effect on accuracy of smaller eigenvalues 
> and noise attenuation. 
> Essentially, we would like to introduce yet another job parameter, q, that 
> governs amount of optional power iterations. The suggestion how to modify the 
> algorithm is outlined here : 
> https://github.com/dlyubimov/ssvd-lsi/wiki/Power-iterations-scratchpad .
> Note that it is different from original power iterations formula in the paper 
> in the sense that additional orthogonalization performed after each 
> iteration. Nathan points out that that improves errors in smaller eigenvalues 
> a lot (If i interpret it right). 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to