[jira] [Commented] (MAHOUT-792) Add new stochastic decomposition code

Ted Dunning (JIRA) Sun, 25 Sep 2011 09:12:49 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114287#comment-13114287
 ]


Ted Dunning commented on MAHOUT-792:
------------------------------------

{quote}
So we either have to save Y, or incur passes over big dataset (assuming A is 
much densier than Y).
{quote}

I think that this got mangled in the typing.  A is sparser than Y and may be 
larger or smaller depending on the average number of non-zero elements.  If A 
is binary or contains only small integers, then it could well be smaller on 
disk than Y if a continuous random matrix Omega is used.  If Omega contains 
only -1, 0, 1 (trinary values), then the values of Y should compress nearly as 
well as the values of A and will be dense as well so we don't have to store the 
indices.

> Add new stochastic decomposition code
> -------------------------------------
>
>                 Key: MAHOUT-792
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-792
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms.  This 
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
>   - a CholeskyDecomposition implementation that does pivoting (and thus 
> rank-revealing) or not.  This should actually be useful for solution of large 
> out-of-core least squares problems.
>   - an in-memory SSVD implementation that should work for matrices up to 
> about 1/3 of available memory.
>   - an out-of-core SSVD threaded implementation that should work for very 
> large matrices.  It should take time about equal to the cost of reading the 
> input matrix 4 times and will require working disk roughly equal to the size 
> of the input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-792) Add new stochastic decomposition code

Reply via email to