[ 
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114282#comment-13114282
 ] 

Dmitriy Lyubimov commented on MAHOUT-792:
-----------------------------------------

Yes... and that's what I mean. I was implementing it as Y-on-the-fly. But that 
implies full pass over A every time we need access to Y, and we need it 3 times 
without option to parallelize. That's why I think I need to save it.

Also, I am thinking one step ahead, power iterations. In that chain, Yi is AB', 
and there's no way to compute that on the fly. So, for first stab at it, it 
would make my life easier to save Y with low degree of replication, and then 
use the rest of pipeline the same way regardless of i. Besides, I think that 
saving Y is supposed to make things more efficient, not less, with most generic 
cases, assuming A >> Y in volume.

> Add new stochastic decomposition code
> -------------------------------------
>
>                 Key: MAHOUT-792
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-792
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms.  This 
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
>   - a CholeskyDecomposition implementation that does pivoting (and thus 
> rank-revealing) or not.  This should actually be useful for solution of large 
> out-of-core least squares problems.
>   - an in-memory SSVD implementation that should work for matrices up to 
> about 1/3 of available memory.
>   - an out-of-core SSVD threaded implementation that should work for very 
> large matrices.  It should take time about equal to the cost of reading the 
> input matrix 4 times and will require working disk roughly equal to the size 
> of the input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to