[jira] [Issue Comment Edited] (MAHOUT-792) Add new stochastic decomposition code

Dmitriy Lyubimov (JIRA) Fri, 26 Aug 2011 11:39:55 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091927#comment-13091927
 ]


Dmitriy Lyubimov edited comment on MAHOUT-792 at 8/26/11 6:38 PM:
------------------------------------------------------------------

more thoughts: 

1 -- it looks like this method is compressing a lot of info into k+p 
upper-triangular R. I guess there's more potential for rounding errors there. 
(I am not buying too much into rounding errors though). 

2 -- Argument of Q matrix being large is not very convincing to me. Q is m x 
(k+p), that is, assuming A is much wider (in terms of non-zero element average 
#) which it should be in order for projection method to make sense at all, it 
is a fraction of A. Besides, it is written in blocks by mappers, and read by 
mappers, so each mapper sees only one block of size say 30,000 x (k+p), which 
it will not be, assuming A is sufficiently wide, and assuming k+p=500, amounts 
to 120mb per mapper. So there's not so much actual memory pressure, that's what 
distributed computations are for. Same goes for dense operations, we just 
distribute them. 

3 -- it looks like you are writing B as output of the second MR, which is (k+p) 
x n. Going back to argument of a 'big Q', we can't assume that B would be any 
less. In fact, some time ago i came to conclusion that it looks like projection 
method would be much more efficient if input is wide rather than tall since 
projection compression factor would be much better for what seems to be fairly 
inexpensive operation (since it costs nothing to redistribute Omega which is 
only backed by one long number as a random gen seed, as opposed to actual long 
or wide bands such as B or Q).  So we can't exclude very wide inputs. 


Overall it looks like a great improvement. I am not convinced that it would cut 
processing time that much (it looks it has the same amount of proposed MR steps 
but it looks like all of them would require shuffle-and-sort and reduce phase, 
whereas with QR is a reduceless process), but it definitely reduces complexity 
of MR implementation and i would be very eager to try it out. Certainly all i 
said is the first impression  and intuition only; and in my experience 
intuition turns out to be wrong surprisingly often as far as benchmark 
guesstimates are concerned.

      was (Author: dlyubimov):
    more thoughts: 

1 -- it looks like this method is compressing a lot of info into k+p 
upper-triangular R. I guess there's more potential for rounding errors there. 
(I am not buying too much into rounding errors though). 

2 -- Argument of Q matrix being large is not very convincing to me. Q is n x 
(k+p), that is, assuming A is much wider (in terms of non-zero element average 
#) which it should be in order for projection method to make sense at all, it 
is a fraction of A. Besides, it is written in blocks by mappers, and read by 
mappers, so each mapper sees only one block of size say 30,000 x (k+p), which 
it will not be, assuming A is sufficiently wide, and assuming k+p=500, amounts 
to 120mb per mapper. So there's not so much actual memory pressure, that's what 
distributed computations are for. Same goes for dense operations, we just 
distribute them. 

3 -- it looks like you are writing B as output of the second MR, which is (k+p) 
x n. Going back to argument of a 'big Q', we can't assume that B would be any 
less. In fact, some time ago i came to conclusion that it looks like projection 
method would be much more efficient if input is wide rather than tall since 
projection compression factor would be much better for what seems to be fairly 
inexpensive operation (since it costs nothing to redistribute Omega which is 
only backed by one long number as a random gen seed, as opposed to actual long 
or wide bands such as B or Q).  So we can't exclude very wide inputs. 


Overall it looks like a great improvement. I am not convinced that it would cut 
processing time that much (it looks it has the same amount of proposed MR steps 
but it looks like all of them would require shuffle-and-sort and reduce phase, 
whereas with QR is a reduceless process), but it definitely reduces complexity 
of MR implementation and i would be very eager to try it out. Certainly all i 
said is the first impression  and intuition only; and in my experience 
intuition turns out to be wrong surprisingly often as far as benchmark 
guesstimates are concerned.
  
> Add new stochastic decomposition code
> -------------------------------------
>
>                 Key: MAHOUT-792
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-792
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Ted Dunning
>         Attachments: MAHOUT-792.patch, MAHOUT-792.patch, sd-2.pdf
>
>
> I have figured out some simplification for our SSVD algorithms.  This 
> eliminates the QR decomposition and makes life easier.
> I will produce a patch that contains the following:
>   - a CholeskyDecomposition implementation that does pivoting (and thus 
> rank-revealing) or not.  This should actually be useful for solution of large 
> out-of-core least squares problems.
>   - an in-memory SSVD implementation that should work for matrices up to 
> about 1/3 of available memory.
>   - an out-of-core SSVD threaded implementation that should work for very 
> large matrices.  It should take time about equal to the cost of reading the 
> input matrix 4 times and will require working disk roughly equal to the size 
> of the input.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (MAHOUT-792) Add new stochastic decomposition code

Reply via email to