[
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114109#comment-13114109
]
Dmitriy Lyubimov commented on MAHOUT-792:
-----------------------------------------
Another thing that I realized while working on MAHOUT-797, is that this
execution plan requires to pass 3 times over input of A instead of existing A,
if we are not saving Y. (first time to comute Y'Y, second time to compute B,
and third time to compute U).
So we either have to save Y, or incur passes over big dataset (assuming A is
much densier than Y).
And we we do save Y then we swap Q for Y, and Y would be much bigger than Q if
A is tall enough.
Potentially we are not saving intermediate Q blocks in the middle(which we can
reduce replication for), but in my task running 3 times over A actually
increases running time.
> Add new stochastic decomposition code
> -------------------------------------
>
> Key: MAHOUT-792
> URL: https://issues.apache.org/jira/browse/MAHOUT-792
> Project: Mahout
> Issue Type: New Feature
> Reporter: Ted Dunning
> Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms. This
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
> - a CholeskyDecomposition implementation that does pivoting (and thus
> rank-revealing) or not. This should actually be useful for solution of large
> out-of-core least squares problems.
> - an in-memory SSVD implementation that should work for matrices up to
> about 1/3 of available memory.
> - an out-of-core SSVD threaded implementation that should work for very
> large matrices. It should take time about equal to the cost of reading the
> input matrix 4 times and will require working disk roughly equal to the size
> of the input.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira