Re: Contributing regression routines to Mahout

Ted Dunning Sat, 28 Aug 2010 17:50:14 -0700

See for instance this reference: http://arxiv.org/abs/0710.1435


On Sat, Aug 28, 2010 at 6:45 PM, Ted Dunning <[email protected]> wrote:

>
> LSQR is essentially a Krylov sub-space technique since it is based on
> Lanczos algorithm.  As such, it is highly iterative and requires many passes
> through the data.
>
> It might be more efficient if you could use a random projection technique.
>  These algorithms can generally do the work of many
> Krylov iterations in a single pass over the data.  I haven't looked into
> this in detail, but it seems to me that a single pass could get you the
> least-squares fit for the first k < 100 singular values in a single pass and
> that subsequent passes could be used
> on the error to get the effect of 100 singular values per iteration.
>
> The crux of the idea is that if you right multiply your large matrix A by a
> tall skinny matrix random matrix R, you should be able to
> generate an orthonormal matrix Q such that Q Y = A R.  The exciting part is
> that Q Q' A will be a close approximation of A in terms
> of Frobenius norm.  Minimizing || Q Q' A x - b ||_F should be possible
> fairly quickly without actually constructing Q Q' A because
> (I think) you should be able to minimize || Q' A x - Q' b ||_F instead.
>  Moreover, since Q' A has limited rank, I think you can use
> (Q' A)' (Q' A) to find the answer you want without the horrible numerical
> problems that approach usually entails.
>
> Less directly, you can form the partial SVD and use that to solve for x.
>
> Solving the L_2 regularized system would proceed essentially the same way
> with a fancier value for A.  I don't know how this would
> extend to solving the L_1 regularized system.
>
> Since just reading A is the dominant cost and since you get something
> proportional to k singular vectors in a single determination of
> Q versus something proportional to 1 with each Krylov iteration, this could
> get you two or three orders of magnitude improvement
> in speed.
>
> As usual, I would refer you to Halko, Martinsson, and Tropp's great survey
> article at http://arxiv.org/abs/0909.4061v1
>
> They refer to some approaches for least squares solutions, but I haven't
> looked at the references that they cite.
>
>
> On Tue, Aug 24, 2010 at 10:21 PM, David G. Boney 
> <[email protected]>wrote:
>
>> I assume that the proper etiquette is to send all correspondences to the
>> list, that side conversations are discouraged.
>>
>> I apologize in advance if I appear ignorant of all that Mahout has to
>> offer. I am currently reading though the book by Mahout in Action and
>> looking at the code.
>>
>> The regression algorithm (real inputs, real outputs) that I am working to
>> implement is LSQR. It is an iterative algorithm based on the conjugate
>> gradient method. It has a nice property that is only uses calculations from
>> Ax and A'y where A is a matrix, A' is the transpose, and x & y are vectors.
>> I would assume that with big data, one wants to avoid calculating A inverse
>> since this is O(n^3). Also, with sparse matrices, one wants to avoid
>> calculating AA' because the result can be dense. Below is a link to the
>> algorithm.
>>
>> http://www.stanford.edu/group/SOL/software/lsqr.html
>>
>> One issue I am working on is calculating Ax and A'y with only one pass
>> through A but perhaps multiple passes through x and y. In general, for
>> regression problems A is m rows and n columns where m >> n. The vector x has
>> the dimension n and the vector y has the dimension m. It might be the case
>> the x will fit in memory but I think the design should not assume this.  I
>> also want to look at the issues where A is dense and A is sparse.
>>
>> LSQR can be used for maximum likelihood estimation of parameters of models
>> that are linear in their coefficients. I will first work on linear
>> regression then investigate the issue of using other basis sets. There may
>> already be work in Mahout to draw on for this issue.
>>
>> I think this will keep me busy for a while.
>>
>> Future issues would include:
>> 1. subset selection using the lasso or ridge regression
>> 2. simple bayesian regression, which their may already be work to draw on
>> in Mahout
>> 3. special basis sets
>>
>> My reference for least squares is:
>> Bjorck, Ake, (1996) Numerical Methods for Least Squares Problems
>>
>> http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=numerical+least+squares&x=0&y=0&ih=12_7_1_1_0_1_0_0_1_1.96_104&fsc=-1
>>
>> My reference for numerical linear algebra is:
>> Saad, Yousef, (2003), Iterative Methods for Sparse Linear Systems
>> http://www-users.cs.umn.edu/~saad/books.html
>>
>> -------------
>> Sincerely,
>> David G. Boney
>> [email protected]
>> http://www.sonicartifacts.com
>>
>>
>>
>>
>> On Aug 24, 2010, at 9:29 PM, Ted Dunning wrote:
>>
>> > On Tue, Aug 24, 2010 at 1:28 PM, David G. Boney <
>> [email protected]>wrote:
>> >
>> >>
>> >> I would like to contribute to Mahout. Who would be the point person on
>> the
>> >> following topics: linear algebra routines, regression (real inputs,
>> real
>> >> outputs), subset selection for regression (lasso or ridge regression),
>> and
>> >> spectral graph methods?
>> >>
>> >
>> > Several of us can help on linear algebra.
>> >
>> > We have no linear regression to speak of at this point.
>> >
>> > I have done a fair bit of work on gradient descent for regularization.
>> >
>> > We have the beginnings of a spectral clustering model (not the same as
>> > general spectral graph meethods) and we have an OK, but not widely used
>> > large-scale eigenvector decomposer.
>> >
>> > I am in the process of implementing a least squares linear regression
>> >> algorithm, LSQR. In my quick review of Mahout, and I make no claims of
>> >> digging into the code at this point, there appears to be extensive work
>> in
>> >> the area of classifiers, discrete outputs, but not regression, real
>> output.
>> >> I have an interest in building up a library of regression techniques
>> (real
>> >> inputs, real outputs).
>> >
>> >
>> > As far as Mahout is concerned, scalability is the key question.  For
>> many
>> > regression problems, especially those with sparse inputs, gradient
>> descent
>> > is very effective and the current SGD for logistic regression could
>> probably
>> > be leveraged.  My guess is that for non-gradient descent methods, the
>> SVD
>> > decompositions would be a better starting point.
>> >
>> > I believe that Vowpal Wabbit can be used for linear regression which
>> > probably implies that Mahout's SGD solver could be as well with small
>> > changes to the gradient computation.
>> >
>> >
>> >> I am also interested in the implementation of the numerical linear
>> algebra
>> >> routines, as these algorithms are at the crux of most regression
>> problems.
>> >>
>> >
>> > We would love more help with this, especially in distributed cases.  Raw
>> > numerical speed is rarely the bottleneck for mahout code because large
>> scale
>> > systems are typically I/O and network bound.  That said, much of our
>> matrix
>> > code is still as we inherited it and while the quality is surprisingly
>> high
>> > for code without unit tests, I know from direct experience that there
>> are
>> > problems.  I am testing and correcting things as I need them, but you
>> would
>> > likely have a broader reach and thus might have substantially more
>> impact
>> > than I have had on the testing side.
>>
>>
>

Re: Contributing regression routines to Mahout

Reply via email to