[jira] [Issue Comment Edited] (MAHOUT-792) Add new stochastic decomposition code

Dmitriy Lyubimov (JIRA) Sat, 24 Sep 2011 19:28:52 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114109#comment-13114109
 ]


Dmitriy Lyubimov edited comment on MAHOUT-792 at 9/25/11 2:27 AM:
------------------------------------------------------------------

Another thing that I realized while working on MAHOUT-797, is that this 
execution plan requires to pass 3 times over input of A instead of existing 1 
pass, if we are not saving Y. (first time to comute Y'Y, second time to compute 
B, and third time to compute U).

So we either have to save Y, or incur passes over big dataset (assuming A is 
much densier than Y). 

And we we do save Y then we swap Q for Y,  which are of the same size.

So.. it looks like we got to save Y... or it is not worth it.

      was (Author: dlyubimov):
    Another thing that I realized while working on MAHOUT-797, is that this 
execution plan requires to pass 3 times over input of A instead of existing 1 
pass, if we are not saving Y. (first time to comute Y'Y, second time to compute 
B, and third time to compute U).

So we either have to save Y, or incur passes over big dataset (assuming A is 
much densier than Y). 

And we we do save Y then we swap Q for Y, and Y would be much bigger than Q if 
A is tall enough.

Potentially we are not saving intermediate Q blocks in the middle(which we can 
reduce replication for), but in my task running 3 times over A actually 
increases running time.

So there are probably cases where this execution plan would be sufficiently 
preferrable, but it seems that A must be supersparse, and wide (which 
contradicts each other).
  
> Add new stochastic decomposition code
> -------------------------------------
>
>                 Key: MAHOUT-792
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-792
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms.  This 
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
>   - a CholeskyDecomposition implementation that does pivoting (and thus 
> rank-revealing) or not.  This should actually be useful for solution of large 
> out-of-core least squares problems.
>   - an in-memory SSVD implementation that should work for matrices up to 
> about 1/3 of available memory.
>   - an out-of-core SSVD threaded implementation that should work for very 
> large matrices.  It should take time about equal to the cost of reading the 
> input matrix 4 times and will require working disk roughly equal to the size 
> of the input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-792) Add new stochastic decomposition code

Reply via email to