See for instance this reference: http://arxiv.org/abs/0710.1435
On Sat, Aug 28, 2010 at 6:45 PM, Ted Dunning <[email protected]> wrote: > > LSQR is essentially a Krylov sub-space technique since it is based on > Lanczos algorithm. As such, it is highly iterative and requires many passes > through the data. > > It might be more efficient if you could use a random projection technique. > These algorithms can generally do the work of many > Krylov iterations in a single pass over the data. I haven't looked into > this in detail, but it seems to me that a single pass could get you the > least-squares fit for the first k < 100 singular values in a single pass and > that subsequent passes could be used > on the error to get the effect of 100 singular values per iteration. > > The crux of the idea is that if you right multiply your large matrix A by a > tall skinny matrix random matrix R, you should be able to > generate an orthonormal matrix Q such that Q Y = A R. The exciting part is > that Q Q' A will be a close approximation of A in terms > of Frobenius norm. Minimizing || Q Q' A x - b ||_F should be possible > fairly quickly without actually constructing Q Q' A because > (I think) you should be able to minimize || Q' A x - Q' b ||_F instead. > Moreover, since Q' A has limited rank, I think you can use > (Q' A)' (Q' A) to find the answer you want without the horrible numerical > problems that approach usually entails. > > Less directly, you can form the partial SVD and use that to solve for x. > > Solving the L_2 regularized system would proceed essentially the same way > with a fancier value for A. I don't know how this would > extend to solving the L_1 regularized system. > > Since just reading A is the dominant cost and since you get something > proportional to k singular vectors in a single determination of > Q versus something proportional to 1 with each Krylov iteration, this could > get you two or three orders of magnitude improvement > in speed. > > As usual, I would refer you to Halko, Martinsson, and Tropp's great survey > article at http://arxiv.org/abs/0909.4061v1 > > They refer to some approaches for least squares solutions, but I haven't > looked at the references that they cite. > > > On Tue, Aug 24, 2010 at 10:21 PM, David G. Boney > <[email protected]>wrote: > >> I assume that the proper etiquette is to send all correspondences to the >> list, that side conversations are discouraged. >> >> I apologize in advance if I appear ignorant of all that Mahout has to >> offer. I am currently reading though the book by Mahout in Action and >> looking at the code. >> >> The regression algorithm (real inputs, real outputs) that I am working to >> implement is LSQR. It is an iterative algorithm based on the conjugate >> gradient method. It has a nice property that is only uses calculations from >> Ax and A'y where A is a matrix, A' is the transpose, and x & y are vectors. >> I would assume that with big data, one wants to avoid calculating A inverse >> since this is O(n^3). Also, with sparse matrices, one wants to avoid >> calculating AA' because the result can be dense. Below is a link to the >> algorithm. >> >> http://www.stanford.edu/group/SOL/software/lsqr.html >> >> One issue I am working on is calculating Ax and A'y with only one pass >> through A but perhaps multiple passes through x and y. In general, for >> regression problems A is m rows and n columns where m >> n. The vector x has >> the dimension n and the vector y has the dimension m. It might be the case >> the x will fit in memory but I think the design should not assume this. I >> also want to look at the issues where A is dense and A is sparse. >> >> LSQR can be used for maximum likelihood estimation of parameters of models >> that are linear in their coefficients. I will first work on linear >> regression then investigate the issue of using other basis sets. There may >> already be work in Mahout to draw on for this issue. >> >> I think this will keep me busy for a while. >> >> Future issues would include: >> 1. subset selection using the lasso or ridge regression >> 2. simple bayesian regression, which their may already be work to draw on >> in Mahout >> 3. special basis sets >> >> My reference for least squares is: >> Bjorck, Ake, (1996) Numerical Methods for Least Squares Problems >> >> http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=numerical+least+squares&x=0&y=0&ih=12_7_1_1_0_1_0_0_1_1.96_104&fsc=-1 >> >> My reference for numerical linear algebra is: >> Saad, Yousef, (2003), Iterative Methods for Sparse Linear Systems >> http://www-users.cs.umn.edu/~saad/books.html >> >> ------------- >> Sincerely, >> David G. Boney >> [email protected] >> http://www.sonicartifacts.com >> >> >> >> >> On Aug 24, 2010, at 9:29 PM, Ted Dunning wrote: >> >> > On Tue, Aug 24, 2010 at 1:28 PM, David G. Boney < >> [email protected]>wrote: >> > >> >> >> >> I would like to contribute to Mahout. Who would be the point person on >> the >> >> following topics: linear algebra routines, regression (real inputs, >> real >> >> outputs), subset selection for regression (lasso or ridge regression), >> and >> >> spectral graph methods? >> >> >> > >> > Several of us can help on linear algebra. >> > >> > We have no linear regression to speak of at this point. >> > >> > I have done a fair bit of work on gradient descent for regularization. >> > >> > We have the beginnings of a spectral clustering model (not the same as >> > general spectral graph meethods) and we have an OK, but not widely used >> > large-scale eigenvector decomposer. >> > >> > I am in the process of implementing a least squares linear regression >> >> algorithm, LSQR. In my quick review of Mahout, and I make no claims of >> >> digging into the code at this point, there appears to be extensive work >> in >> >> the area of classifiers, discrete outputs, but not regression, real >> output. >> >> I have an interest in building up a library of regression techniques >> (real >> >> inputs, real outputs). >> > >> > >> > As far as Mahout is concerned, scalability is the key question. For >> many >> > regression problems, especially those with sparse inputs, gradient >> descent >> > is very effective and the current SGD for logistic regression could >> probably >> > be leveraged. My guess is that for non-gradient descent methods, the >> SVD >> > decompositions would be a better starting point. >> > >> > I believe that Vowpal Wabbit can be used for linear regression which >> > probably implies that Mahout's SGD solver could be as well with small >> > changes to the gradient computation. >> > >> > >> >> I am also interested in the implementation of the numerical linear >> algebra >> >> routines, as these algorithms are at the crux of most regression >> problems. >> >> >> > >> > We would love more help with this, especially in distributed cases. Raw >> > numerical speed is rarely the bottleneck for mahout code because large >> scale >> > systems are typically I/O and network bound. That said, much of our >> matrix >> > code is still as we inherited it and while the quality is surprisingly >> high >> > for code without unit tests, I know from direct experience that there >> are >> > problems. I am testing and correcting things as I need them, but you >> would >> > likely have a broader reach and thus might have substantially more >> impact >> > than I have had on the testing side. >> >> >
