I have a query on the R-squared correlation coefficient for linear regression through the origin.
The general expression for R-squared in regression (whether linear or non-linear) is R-squared = 1 - sum(y-ypredicted)^2 / sum(y-ybar)^2 However, the lm function within R does not seem to use this expression when the intercept is constrained to be zero. It gives results different to Excel and other data analysis packages. As an example (using built-in cars dataframe): > cars.lm=lm(dist ~ 0+speed, data=cars) # linear regression through origin > summary(cars.lm)$r.squared # report R-squared [1] 0.8962893 > 1-deviance(cars.lm)/sum((cars$dist-mean(cars$dist))^2) # calculates R-squared directly [1] 0.6018997 > # The latter corresponds to the value reported by Excel (and other data analysis packages) > > # Note that we expect R-squared to be smaller for linear regression through the origin > # than for linear regression without a constraint (which is 0.6511 in this example) Does anyone know what R is doing in this case? Is there an option to get R to return what I termed the "general" expression for R-squared? The adjusted R-squared value is also affected. [Other parameters all seem correct.] Thanks for any help on this issue, Patrick P.S. I believe old versions of Excel (before 2003) also had this issue. -- Dr Patrick J. Barrie Department of Chemical Engineering and Biotechnology University of Cambridge Philippa Fawcett Drive, Cambridge CB3 0AS 01223 331864 pj...@cam.ac.uk [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.