[jira] [Commented] (MAHOUT-792) Add new stochastic decomposition code

Ted Dunning (JIRA) Fri, 26 Aug 2011 13:25:55 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092001#comment-13092001
 ]


Ted Dunning commented on MAHOUT-792:
------------------------------------

{quote}
1 – it looks like this method is compressing a lot of info into k+p 
upper-triangular R. I guess there's more potential for rounding errors there. 
(I am not buying too much into rounding errors though).
{quote}

This is absolutely true.  On the other hand, this is only really putting as 
much information into that matrix as we ultimately want back out (i.e. we *are* 
reducing dimension here).

{quote}
2 – Argument of Q matrix being large is not very convincing to me. Q is m x 
(k+p), that is, assuming A is much wider (in terms of non-zero element average 
#) which it should be in order for projection method to make sense at all, it 
is a fraction of A. 
{quote}
But A is sparse so Y, B and Q are all about the same (storage) size as A.  In 
fact, if k+p > average number of elements per row of A, then Y, B and Q will 
all be *larger* than A.

{quote}
Besides, it is written in blocks by mappers, and read by mappers, so each 
mapper sees only one block of size say 30,000 x (k+p), which it will not be, 
assuming A is sufficiently wide, and assuming k+p=500, amounts to 120mb per 
mapper. So there's not so much actual memory pressure, that's what distributed 
computations are for. Same goes for dense operations, we just distribute them.
{quote}
Of course.  But using the Cholesky trick means that Q'A can be done by reading 
only A instead of Q and A.

{quote}
3 – it looks like you are writing B as output of the second MR, which is (k+p) 
x n. Going back to argument of a 'big Q', we can't assume that B would be any 
less. 
{quote}
I completely agree.  B is too big to store all in memory and could easily be 
comparable to or larger than A.

{quote}
In fact, some time ago i came to conclusion that it looks like projection 
method would be much more efficient if input is wide rather than tall since 
projection compression factor would be much better for what seems to be fairly 
inexpensive operation (since it costs nothing to redistribute Omega which is 
only backed by one long number as a random gen seed, as opposed to actual long 
or wide bands such as B or Q). So we can't exclude very wide inputs.
{quote}
Indeed.

The common web-case could easily wind up that way.  You might have a million 
users exploring a billion web pages.  

{quote}
Overall it looks like a great improvement. I am not convinced that it would cut 
processing time that much (it looks it has the same amount of proposed MR steps 
but it looks like all of them would require shuffle-and-sort and reduce phase, 
whereas with QR is a reduceless process), but it definitely reduces complexity 
of MR implementation
{quote}
This is might thought exactly.  It also decreases the memory load of in-memory 
implementations which is good.  But as you say later, intuition is 
extraordinarily dangerous for run-times.


> Add new stochastic decomposition code
> -------------------------------------
>
>                 Key: MAHOUT-792
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-792
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms.  This 
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
>   - a CholeskyDecomposition implementation that does pivoting (and thus 
> rank-revealing) or not.  This should actually be useful for solution of large 
> out-of-core least squares problems.
>   - an in-memory SSVD implementation that should work for matrices up to 
> about 1/3 of available memory.
>   - an out-of-core SSVD threaded implementation that should work for very 
> large matrices.  It should take time about equal to the cost of reading the 
> input matrix 4 times and will require working disk roughly equal to the size 
> of the input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-792) Add new stochastic decomposition code

Reply via email to