Re: Stochastic SVD

Ted Dunning Sun, 28 Mar 2010 10:52:49 -0700

If we use R directly as the transformation, then you are correct that the
eigenvalues are not properly preserved even though they may be well
approximated.  For very large dimensional cases, the preservation is pretty
good.  I have attached a graph that illustrates how well the eigenvalues of
R' A' A R approximate the actual eigenvalues for a 1000 dimensional rank 30
example.  (The attachment may be stripped ... if so and anybody wants to see
it, I can put it somewhere more permanent)


On the other hand, if you do find an orthonormal basis of AR, you get pretty
much exact reconstruction.  The fact is, this is pretty easy to do and no
special choice of R is required (other than having more vectors than
strictly necessary).  That is pretty surprising so paper goes to great
lengths to demonstrate how robust this operation is.

I don't remember all that much discussion of doing anything specific about
aligning Q with the left eigenvectors.  In fact, I don't think it is really
even necessary.  Can you point to a particular point in the paper?

On Sun, Mar 28, 2010 at 9:09 AM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:

> Thank you, Ted.
>
> I guess I have a little trouble visualizing all of it . The original work
> goes to great lengths to preserve 'the action' of the original matrix, which
> I can understand. The way I visualize it, let's say we take a hyperplane we
> project into with help of orthonormal R'. but since R' is not fully
> orthogonal, it is not an isometry which means some of the left eigenvectors
> may get 'squashed' more than the others which means the 'order of
> principiality' may be lost. That's how I tried to visualize it before and
> this is what I found confirmation for in the paper with all this talk about
> aligning Q columns with left eigenvectors of A (not that I get to the point
> how they actually achieve that).
>
> Well I guess I don't have really to understand all of it. Thank you very
> much for taking time to explain.
>
> -Dmitriy
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunn...@gmail.com]
> Sent: Saturday, March 27, 2010 11:59 PM
> To: mahout-dev@lucene.apache.org
> Subject: Re: Stochastic SVD
>
> Dmitriy,
>
> The current approach that Jake is working on is to form R' A' A R in a
> single pass over A where R is a suitable random matrix.  This is a
> relatively small symmetric matrix so we can find V and S^2 such that R' A'
> A
> R = V S^2 V'.  But this can give us the SVD of AR by U = (AR) V S^-1.  U is
> the desired orthonormal decomposition of AR and can be used as Q in the
> algorithms.  Jake is talking about storing AR as a separate output of the
> first pass, but a second pass over A would work just as well.
>
> Since R has more columns than singular values that we want to extract from
> A, we can be pretty sure that AR spans the eigen-vectors of interest.
>
> The problem that I see with the Gram-Schmidt approach is that it assumes
> that you have the y(i) one at a time while the practical situation is that
> you get rows of Y rather than columns.  My guess is that you can do some
> trickiness here, and that the Gram-Schmidt approach would give better
> accuracy, but that the simplicity of the R'A' AR approach may compensate in
> practice.
>
>
> On Sat, Mar 27, 2010 at 10:14 PM, Dmitriy Lyubimov <dlie...@gmail.com
> >wrote:
>
> > Hi,
> >
> > so i am trying to read a little deeper into this and was wondering if you
> > can give me some insight of yours into this.
> >
> > so we construct a random projection, that's clear and that will require
> one
> > pass over the original matrix or covariance matrix in case of PCA.
> >
> > But... it goes to say that before we project onto Q, we need to make sure
> > its span captures principal left eigenvectors. Which in its turn suggests
> > per alg 4.1 another pass over A and then running a Gram Schmidt
> procedure.
> >
> > so what's your insight about this --
> >
> > 1) do we really need to do 2 passes over A to get to the projection? if
> > not,
> > why not?
> >
> > 2) how do we orthonormalize Q efficiently? They seem to have mentioned
> (so
> > far) that they themselves used Gramm-Schmidt with double
> orthogonalization
> > for that purpose. I am not familiar with this particular permutation, can
> > we
> > do it in non-iterative way? I am only familiar with 'standard'
> > Gramm-Schmidt
> > which is strictly iterative. But to iterativeness is averse to MR (and
> even
> > if it weren't, that probably would mean we would need to run another MR
> > here
> > just to orthonormalize Q unless again we can figure how to combine that
> > with
> > the projection job).
> >
> > i guess it still may be simplified further, i haven't finished reading
> all
> > of it, but if there are answers further in the paper, do you think you
> > could
> > point the section that talks about variation that is most suitable for
> MR?
> >
> >
> > Thank you very much.
> >
> > -Dmitriy
> >
> > On Mon, Mar 22, 2010 at 8:12 PM, Jake Mannix <jake.man...@gmail.com>
> > wrote:
> >
> > > Actually, maybe what you were thinking (at least, what *I* am thinking)
> > is
> > > that you can indeed do it on one pass through the *original* data (ie
> you
> > > can
> > > get away with never keeping a handle on the original data itself),
> > because
> > > on the "one pass" through that data, you spit out MultipleOutputs - one
> > > SequenceFile of the randomly projected data, which doesn't hit a
> reducer
> > > at all, and a second output which is the outer product of those vectors
> > > with themselves, which its a summing reducer.
> > >
> > > In this sense, while you need to pass over the original data's *size*
> > > (in terms of number of rows) a second time, if you want to consider
> > > it data to be played with (instead of just "training" data for use on a
> > > smaller subset or even totally different set), you don't need to pass
> > > over the original entire data *set* ever again.
> > >
> > >  -jake
> > >
> > > On Mon, Mar 22, 2010 at 6:35 PM, Ted Dunning <ted.dunn...@gmail.com>
> > > wrote:
> > >
> > > > You are probably right.  I had a wild hare tromp through my thoughts
> > the
> > > > other day saying that one pass should be possible, but I can't
> > > reconstruct
> > > > the details just now.
> > > >
> > > > On Mon, Mar 22, 2010 at 6:00 PM, Jake Mannix <jake.man...@gmail.com>
> > > > wrote:
> > > >
> > > > > I guess if you mean just do a random projection on the original
> data,
> > > you
> > > > > can certainly do that in one pass, but that's random projection,
> not
> > a
> > > > > stochastic decomposition.
> > > > >
> > > >
> > >
> >
>
>

Re: Stochastic SVD

Reply via email to