On Tue, Jan 19, 2010 at 8:13 PM, Charles R Harris <[email protected] > wrote:
> > > On Tue, Jan 19, 2010 at 8:02 PM, <[email protected]> wrote: > >> On Tue, Jan 19, 2010 at 9:47 PM, Charles R Harris >> <[email protected]> wrote: >> > >> > >> > On Tue, Jan 19, 2010 at 6:08 PM, <[email protected]> wrote: >> >> >> >> On Tue, Jan 19, 2010 at 6:48 PM, Charles R Harris >> >> <[email protected]> wrote: >> >> > >> >> > >> >> > On Tue, Jan 19, 2010 at 2:34 PM, <[email protected]> wrote: >> >> >> >> >> >> On Tue, Jan 19, 2010 at 4:29 PM, Charles R Harris >> >> >> <[email protected]> wrote: >> >> >> > >> >> >> > >> >> >> > On Tue, Jan 19, 2010 at 1:47 PM, Gael Varoquaux >> >> >> > <[email protected]> wrote: >> >> >> >> >> >> >> >> On Tue, Jan 19, 2010 at 02:22:30PM -0600, Robert Kern wrote: >> >> >> >> > > y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, y)[0]) >> >> >> >> >> >> >> >> > > with np = numpy and linalg = scipy.linalg where scipy calls >> >> >> >> > > ATLAS. >> >> >> >> >> >> >> >> > For clarification, are you trying to find the components of the >> y >> >> >> >> > vectors that are perpendicular to the space spanned by the 10 >> >> >> >> > orthonormal vectors in confounds? >> >> >> >> >> >> >> >> Yes. Actually, what I am doing is calculating partial correlation >> >> >> >> between >> >> >> >> x and y conditionally to confounds, with the following code: >> >> >> >> >> >> >> >> def cond_partial_cor(y, x, confounds=[]): >> >> >> >> """ Returns the partial correlation of y and x, conditionning >> on >> >> >> >> confounds. >> >> >> >> """ >> >> >> >> # First orthogonalise y and x relative to confounds >> >> >> >> if len(confounds): >> >> >> >> y = y - np.dot(confounds.T, linalg.lstsq(confounds.T, >> y)[0]) >> >> >> >> x = x - np.dot(confounds.T, linalg.lstsq(confounds.T, >> x)[0]) >> >> >> >> return np.dot(x, y)/sqrt(np.dot(y, y)*np.dot(x, x)) >> >> >> >> >> >> >> >> I am not sure that what I am doing is optimal. >> >> >> >> >> >> >> >> > > Most of the time is spent in linalg.lstsq. The length of the >> >> >> >> > > vectors >> >> >> >> > > is >> >> >> >> > > 810, and there are about 10 confounds. >> >> >> >> >> >> >> >> > Exactly what are the shapes? y.shape = (810, N); >> confounds.shape = >> >> >> >> > (810, >> >> >> >> > 10)? >> >> >> >> >> >> >> >> Sorry, I should have been more precise: >> >> >> >> >> >> >> >> y.shape = (810, ) >> >> >> >> confounds.shape = (10, 810) >> >> >> >> >> >> >> > >> >> >> > Column stack the bunch so that the last column is y, then do a qr >> >> >> > decomposition. The last column of q is the (normalized) orthogonal >> >> >> > vector >> >> >> > and its amplitude is the last (bottom right) component of r. >> >> >> >> >> >> do you have to do qr twice, once with x and once with y in the last >> >> >> column or can this be combined? >> >> >> >> >> >> I was trying to do something similar for partial autocorrelation for >> >> >> timeseries but didn't manage or try anything better than repeated >> >> >> leastsq or a variant. >> >> >> >> >> > >> >> > Depends on what you want to do. The QR decomposition is essentially >> >> > Gram-Schmidt on the columns. So if you just want an orthonormal basis >> >> > for >> >> > the subspace spanned by a bunch of columns, the columns of Q are >> they. >> >> > To >> >> > get the part of y orthogonal to that subspace you can do y - Q*Q.T*y, >> >> > which >> >> > is probably what you want if the x's are fixed and the y's vary. If >> >> > there is >> >> > just one y, then putting it as the last column lets the QR algorithm >> do >> >> > that >> >> > last bit of projection. >> >> >> >> Gram-Schmidt (looking at it for the first time) looks a lot like >> >> sequential least squares projection. So, I'm trying to figure out if I >> >> can use the partial results up to a specific column as partial least >> >> squares and then work my way to the end by including/looking at more >> >> columns. >> >> >> > >> > I don't the QR factorization would work for normal PLS. IIRC, one of the >> > algorithms does a svd of the cross correlation matrix. The difference is >> > that in some sense the svd picks out the best linear combination of >> columns, >> > while the qr factorization without column pivoting just takes them in >> order. >> > The QR factorization used to be the method of choice for least squares >> > because it is straight forward to compute, no iterating needed as in >> svd, >> > but these days that advantage is pretty much gone. It is still a common >> > first step in the svd, however. The matrix is factored to Q*R, then the >> svd >> > of R is computed. >> >> I (finally) figured out svd and eigenvalue decomposition for this purpose. >> >> But from your description of QR, I thought specifically of the case >> where we have a "natural" ordering of the regressors, similar to the >> polynomial case of you and Anne. In the timeseries case it would be by >> increasing lags >> >> yt on y_{t-1} >> yt on y_{t-1}, y_{t-2} >> ... >> ... >> yt on y_{t-k} for k= 1,...,K >> >> or yt on xt and the lags of xt >> >> This is really sequential LS with a predefined sequence, not PLS or >> PCA/PCR or similar orthogonalization by "importance". >> The usual procedure for deciding on the appropriate number of lags >> usually loops over OLS with increasing number of regressors. >> >From the discussion, I thought there might be a way to "cheat" in this >> using QR and Gram-Schmidt >> >> > Ah, then I think your idea would work. The norms of the residuals at each > step would be along the diagonal of the R matrix. They won't necessarily > decrease monotonically, however. > > Or if you are fitting a quantity with the lags, then Q.T*y gives the component of each orthogonalized lag. The running sum of the squares of the components should approach the variance of y, so I expect the ratio would give the percentage of the variance accounted for by the lags up to that point. Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
