[
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114287#comment-13114287
]
Ted Dunning commented on MAHOUT-792:
------------------------------------
{quote}
So we either have to save Y, or incur passes over big dataset (assuming A is
much densier than Y).
{quote}
I think that this got mangled in the typing. A is sparser than Y and may be
larger or smaller depending on the average number of non-zero elements. If A
is binary or contains only small integers, then it could well be smaller on
disk than Y if a continuous random matrix Omega is used. If Omega contains
only -1, 0, 1 (trinary values), then the values of Y should compress nearly as
well as the values of A and will be dense as well so we don't have to store the
indices.
> Add new stochastic decomposition code
> -------------------------------------
>
> Key: MAHOUT-792
> URL: https://issues.apache.org/jira/browse/MAHOUT-792
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
> Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms. This
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
> - a CholeskyDecomposition implementation that does pivoting (and thus
> rank-revealing) or not. This should actually be useful for solution of large
> out-of-core least squares problems.
> - an in-memory SSVD implementation that should work for matrices up to
> about 1/3 of available memory.
> - an out-of-core SSVD threaded implementation that should work for very
> large matrices. It should take time about equal to the cost of reading the
> input matrix 4 times and will require working disk roughly equal to the size
> of the input.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira