Thank you,
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

>>> Berend Hasselman <b...@xs4all.nl> 2/18/2012 9:05 AM >>>

On 18-02-2012, at 14:36, John Sorkin wrote:

> I am trying to use matrix algebra to get the beta coefficients from a simple 
> bivariate linear regression, y=f(x).
> The coefficients should be computable using the following matrix algebra: 
> t(X)Y / t(x)X
> 
> I have pasted the code I wrote below. I clearly odes not work both because it 
> returns a matrix rather than a vector containing two elements the beta for 
> the intercept and the beta for x, and because the values produced by the 
> matrix algebra are not the same as those returned by the linear regression. 
> Can someone tell we where I have gone wrong, either in my use of matrix 
> algebra in R, or perhaps at a more fundamental theoretical level?
> Thanks,
> John
> 
> # Define intercept, x and y.
> int <- rep(1,100)
> x   <- 1:100
> y   <- 2*x + rnorm(100)
> 
> # Create a matrix to hold values.
> data           <- matrix(nrow=100,ncol=3)
> dimnames(data) <- list(NULL,c("int","x","y"))
> data[,"int"] <- int
> data[,"x"]   <- x
> data[,"y"]   <- y
> data
> 
> # Compute numerator.
> num <-  cov(data)
> num
> 
> # Compute denominator
> denom <- solve(t(data) %*% data)
> denom
> 
> # Compute betas, [t(X)Y]/[t(X)Y]
> betaRon <-    num %*% denom
> betaRon
> 
> # Get betas from regression so we can check
> # values obtaned by matrix algebra.
> fit0 <- lm(y~x)
> 

data should only contain the independent (righthand side) variables.

So

# Create a matrix to hold values.
data           <- matrix(nrow=100,ncol=2)
dimnames(data) <- list(NULL,c("int","x"))#,"y"))
data[,"int"] <- int
data[,"x"]   <- x

You should get correct results.
A better way to calculate the beta's is to calculate them directly without 
explicitly calculating an invers

# Better
# Compute betas, from t(X)X beta = t(Z) y
# without computing inverse explicitly
betaAlt <- solve(t(data) %*% data, num)
betaAlt

Even better is the method lm uses without forming t(data) %*%  data explicitly 
but using a (pivoted) QR decomposition of data.

Berend





Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to