I am going to stick with LSQR. Part of this is an exercise to learn how to use Hadoop and Mahout. I also consulted with Diane O'Leary at University of Maryland, a leading numerical linear algebra researcher (and a former professor of mine). She highly recommended the algorithm. I want to start with something well know and reliable.
I am very interested in making additional contributions to the numerical linear algebra system. I will gladly review the method you suggested for future projects. Right now I am reading up on iterative methods in a book by Henk A. van der Vorst. Henk A. van der Vorst, (2009), Iterative Krylov Methods for Large Linear Systems. http://www.amazon.com/Iterative-Cambridge-Monographs-Computational-Mathematics/dp/0521183707/ref=sr_1_3?ie=UTF8&s=books&qid=1283044114&sr=8-3 ------------- Sincerely, David G. Boney [email protected] http://www.sonicartifacts.com On Aug 28, 2010, at 7:45 PM, Ted Dunning wrote: > LSQR is essentially a Krylov sub-space technique since it is based on > Lanczos algorithm. As such, it is highly iterative and requires many passes > through the data. > > It might be more efficient if you could use a random projection technique. > These algorithms can generally do the work of many > Krylov iterations in a single pass over the data. I haven't looked into > this in detail, but it seems to me that a single pass could get you the > least-squares fit for the first k < 100 singular values in a single pass and > that subsequent passes could be used > on the error to get the effect of 100 singular values per iteration. > > The crux of the idea is that if you right multiply your large matrix A by a > tall skinny matrix random matrix R, you should be able to > generate an orthonormal matrix Q such that Q Y = A R. The exciting part is > that Q Q' A will be a close approximation of A in terms > of Frobenius norm. Minimizing || Q Q' A x - b ||_F should be possible > fairly quickly without actually constructing Q Q' A because > (I think) you should be able to minimize || Q' A x - Q' b ||_F instead. > Moreover, since Q' A has limited rank, I think you can use > (Q' A)' (Q' A) to find the answer you want without the horrible numerical > problems that approach usually entails. > > Less directly, you can form the partial SVD and use that to solve for x. > > Solving the L_2 regularized system would proceed essentially the same way > with a fancier value for A. I don't know how this would > extend to solving the L_1 regularized system. > > Since just reading A is the dominant cost and since you get something > proportional to k singular vectors in a single determination of > Q versus something proportional to 1 with each Krylov iteration, this could > get you two or three orders of magnitude improvement > in speed. > > As usual, I would refer you to Halko, Martinsson, and Tropp's great survey > article at http://arxiv.org/abs/0909.4061v1 > > They refer to some approaches for least squares solutions, but I haven't > looked at the references that they cite. > > > On Tue, Aug 24, 2010 at 10:21 PM, David G. Boney > <[email protected]>wrote: > >> I assume that the proper etiquette is to send all correspondences to the >> list, that side conversations are discouraged. >> >> I apologize in advance if I appear ignorant of all that Mahout has to >> offer. I am currently reading though the book by Mahout in Action and >> looking at the code. >> >> The regression algorithm (real inputs, real outputs) that I am working to >> implement is LSQR. It is an iterative algorithm based on the conjugate >> gradient method. It has a nice property that is only uses calculations from >> Ax and A'y where A is a matrix, A' is the transpose, and x & y are vectors. >> I would assume that with big data, one wants to avoid calculating A inverse >> since this is O(n^3). Also, with sparse matrices, one wants to avoid >> calculating AA' because the result can be dense. Below is a link to the >> algorithm. >> >> http://www.stanford.edu/group/SOL/software/lsqr.html >> >> One issue I am working on is calculating Ax and A'y with only one pass >> through A but perhaps multiple passes through x and y. In general, for >> regression problems A is m rows and n columns where m >> n. The vector x has >> the dimension n and the vector y has the dimension m. It might be the case >> the x will fit in memory but I think the design should not assume this. I >> also want to look at the issues where A is dense and A is sparse. >> >> LSQR can be used for maximum likelihood estimation of parameters of models >> that are linear in their coefficients. I will first work on linear >> regression then investigate the issue of using other basis sets. There may >> already be work in Mahout to draw on for this issue. >> >> I think this will keep me busy for a while. >> >> Future issues would include: >> 1. subset selection using the lasso or ridge regression >> 2. simple bayesian regression, which their may already be work to draw on >> in Mahout >> 3. special basis sets >> >> My reference for least squares is: >> Bjorck, Ake, (1996) Numerical Methods for Least Squares Problems >> >> http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=numerical+least+squares&x=0&y=0&ih=12_7_1_1_0_1_0_0_1_1.96_104&fsc=-1 >> >> My reference for numerical linear algebra is: >> Saad, Yousef, (2003), Iterative Methods for Sparse Linear Systems >> http://www-users.cs.umn.edu/~saad/books.html >> >> ------------- >> Sincerely, >> David G. Boney >> [email protected] >> http://www.sonicartifacts.com >> >> >> >> >> On Aug 24, 2010, at 9:29 PM, Ted Dunning wrote: >> >>> On Tue, Aug 24, 2010 at 1:28 PM, David G. Boney <[email protected] >>> wrote: >>> >>>> >>>> I would like to contribute to Mahout. Who would be the point person on >> the >>>> following topics: linear algebra routines, regression (real inputs, real >>>> outputs), subset selection for regression (lasso or ridge regression), >> and >>>> spectral graph methods? >>>> >>> >>> Several of us can help on linear algebra. >>> >>> We have no linear regression to speak of at this point. >>> >>> I have done a fair bit of work on gradient descent for regularization. >>> >>> We have the beginnings of a spectral clustering model (not the same as >>> general spectral graph meethods) and we have an OK, but not widely used >>> large-scale eigenvector decomposer. >>> >>> I am in the process of implementing a least squares linear regression >>>> algorithm, LSQR. In my quick review of Mahout, and I make no claims of >>>> digging into the code at this point, there appears to be extensive work >> in >>>> the area of classifiers, discrete outputs, but not regression, real >> output. >>>> I have an interest in building up a library of regression techniques >> (real >>>> inputs, real outputs). >>> >>> >>> As far as Mahout is concerned, scalability is the key question. For many >>> regression problems, especially those with sparse inputs, gradient >> descent >>> is very effective and the current SGD for logistic regression could >> probably >>> be leveraged. My guess is that for non-gradient descent methods, the SVD >>> decompositions would be a better starting point. >>> >>> I believe that Vowpal Wabbit can be used for linear regression which >>> probably implies that Mahout's SGD solver could be as well with small >>> changes to the gradient computation. >>> >>> >>>> I am also interested in the implementation of the numerical linear >> algebra >>>> routines, as these algorithms are at the crux of most regression >> problems. >>>> >>> >>> We would love more help with this, especially in distributed cases. Raw >>> numerical speed is rarely the bottleneck for mahout code because large >> scale >>> systems are typically I/O and network bound. That said, much of our >> matrix >>> code is still as we inherited it and while the quality is surprisingly >> high >>> for code without unit tests, I know from direct experience that there are >>> problems. I am testing and correcting things as I need them, but you >> would >>> likely have a broader reach and thus might have substantially more impact >>> than I have had on the testing side. >> >>
