[ 
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092071#comment-13092071
 ] 

Dmitriy Lyubimov commented on MAHOUT-792:
-----------------------------------------

bq. But A is sparse so Y, B and Q are all about the same (storage) size as A. 
In fact, if k+p > average number of elements per row of A, then Y, B and Q will 
all be larger than A.

That's true. I had one guy who had millions of rows but only 10 average 
measurements per row. Probably ratings or something. it is not going to be 
efficient (cpu-wise) in these cases. 

But my response has always been, if you have input thinner than projection, 
then why even use projection? It's my understanding that the whole idea of this 
is to drastically reduce the input to analyze. Original paper actually never 
suggested to compute BB' as far as i remember, it's something i did to open up 
the n by sacrificing rounding. In original paper, they compute SVD of B, so if 
B is greater than input, it would only cost more to compute SVD of one. So 
that's what i understood -- B is _supposed_ to be much less than A in original 
work, otherwise there's not much sense. 

> Add new stochastic decomposition code
> -------------------------------------
>
>                 Key: MAHOUT-792
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-792
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms.  This 
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
>   - a CholeskyDecomposition implementation that does pivoting (and thus 
> rank-revealing) or not.  This should actually be useful for solution of large 
> out-of-core least squares problems.
>   - an in-memory SSVD implementation that should work for matrices up to 
> about 1/3 of available memory.
>   - an out-of-core SSVD threaded implementation that should work for very 
> large matrices.  It should take time about equal to the cost of reading the 
> input matrix 4 times and will require working disk roughly equal to the size 
> of the input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to