Re: [math] Improving numerics in OLSMultipleLinearRegression

Mauro Talevi Mon, 09 Jun 2008 00:59:07 -0700

Hi Phil,

thanks for reviewing the multiple linear regression implementations andsetting up the R/NIST data tests. I finally got around to installing Rand can now run them too.


Phil Steitz wrote:

While clear and elegant from a matrix algebra standpoint, the "nailve"implementation in OLSMultipleLinearRegression has bad numericalqualities. It is well known that solving the normal equations directlydoes not give good numerics. I just added some tests to actually verifyparameter values, using the classic "Longly" dataset, for which NISTprovides certified statistics. This is a "hard" design matrix. R wasable to get to within 1E-8 of the certified parameter values.OLSMultipleLinearRegression can only get 1E-1.

The OLS implementation has been added as a simple by-product of the GLScase - which is the main one I have needed for hypothesis testing - asit came "for free" with unitary covariance.True - the emphasis was on clarity and formulaic simplicity. And alsofollowing the old Donald Knuth maxim "optimization is the root of allevil". But it seems like there is a need for refinement of theimplementation - the devil raised his head :-)

We have talked in the past about providing an implementation based on QRdecomposition. Anyone up for using the QR decomposition that we nowhave to do this? I really think we need to do it (or something else toimprove numerics) before releasing this class. I will get to iteventually, but am a little pegged at the moment. I will review andapply patches if someone is willing to do the implementation. I canalso explain here or offline how the R tests and NIST datasets work, asthese are useful in validating code.

I'd be happy to improve the impl. I'm getting my head around R andNIST, but perhaps a chat offline would not hurt!

Another thing that we should think about before releasing any of thisstuff is the completeness of the API. Many standard regressionstatistics are missing. If we are going to stick with the Interface /Implementation setup, we need to get the right stuff into theinterface. It is also awkward to have to insert "1"'s in the designmatrix to get an intercept term computed. This is convenient forimplementation, but awkward for users. A more natural setup (IMHO)would be to expose a "noIntercept" or "hasIntercept" property for themodel.

No problem with adding other statistics - let's just decide on what isthe stardard regression API.


And finally, how do you see the no/hasIntercept model working?

Cheers




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [math] Improving numerics in OLSMultipleLinearRegression

Reply via email to