Re: no correlation assumption among X's in MLR

David A. Heiser Thu, 04 May 2000 19:37:38 -0700

----- Original Message -----
From: Warren Sarle <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, May 04, 2000 12:23 PM
Subject: Re: no correlation assumption among X's in MLR


> Of course Herman is right (as usual)! Where are people getting this
> ridiculous idea that correlation and collinearity are the same thing?
............................................................................
..................................
Statistics is one field that has almost no agreed on usage of terms.
Everybody is independent.

In one of my books, "Applied Linear Regression Models", by Neter, Wasserman
and Kutner (1989) says "When the independent variables are correlated among
themselves, intercorrelation or multicollinearity among them is said to
exist. (Sometimes the latter term is reserved for those instances when the
correlation among independent variables is very high.)..." The authors use
multicolinearlity to refer to the correlation between X variables. (Who is
right?)

>From a numerical analysis viewpoint the basic matrix in OLS is the
normalized X matrix which is called the correlation matrix. If
standardization is not applied, the matrix is the covariance matrix.

It is clear then that there is a numerical difference between the covariance
and correlation matricies.
............................................................................
...............
>
> Assuming you're using an intercept, a pair of variables is
> collinear if and only if their correlation is 1.0 or -1.0.
> Three or more variables are collinear if and only if there
> is at least one of the variables that has a multiple
> correlation of 1.0 with the other variables.
............................................................................
.......
This may be your interpretation, but it is not universal.
...............................................................
>
> If the independent variables in a multiple linear regression are
> collinear, there are infinitely many sets of least-squares
> regression coefficients that produce the same predictions, MSE,
> R-squared, etc.
..............................................................
This is only true when the correlation matrix has off diagonals with
1.0000000000000000000000000000000000000............. If it is slightly
different because of numerical representations in the computer, there will
be a finite set of apparent identical solutions.
...................................................

  Although least squares does not produce unique
> estimates, if you have prior information, you may be able to get
> meaningful and useful Bayesian estimates. Regardless of whether you
> have prior information, you can get useful predictions for new
> cases lying in the same subspace as the original sample. Without
> prior information, you cannot get useful extrapolations outside of
> that subspace.  Statisticians who are not data miners sometimes
> forget the distinction between estimation and prediction. :-)
............................................................................
............................
For many years the method of ridge analysis (non-Bayesian) has been
extensively used in industry to get valid and workable extrapolations (i.e.
predictions) beyond the range of the data used. The technique of varying
lambda to reduce the variance inflation factor is a very good way to obtain
useful and valid predictions. (All non-Baysian).
........................................................
> Collinearity generally will NOT cause different machines or
> different programs to get different answers provided the same
> algorithm is used and the programs are written competently.
> For example, two different programs running on different machines
> will get the same estimates (except perhaps for the last few bits)
> provided both use a Moore-Penrose inverse, or both use a G1 inverse,
> and both do singularity checks correctly. If the numerous SAS procs
> for regression started disagreeing with each other, or started
> getting different answers on different machines just because of
> collinearity, our QA people would totally freak out!
...................................................................
Obviously you have no extensive experience in writting numerical programs in
C++, Fortran or Visual Basic. If you had, you would have experience directly
the differences between machines and compliers, and how they handle floating
point numbers.

The Moore-Penrose inverse is a symbolic reference in equations. It has
nothing to do with the machinery of going from data to parameter values.

One way that software developers mask the differences is by only reporting
parameter values to 5 figures. In my camp, if there is a difference between
two 15 digit numbers, arrived at by two different methods, there is concern.
This is why NIST developed a whole set of tests to bring out the
computational problems.

An algorithm is the specific way an equation is computed in a computer
program. I can calculate a polynomial by many ways, but there is only one
recommended way to do it to minimize error.

For example, solution of the cummulative normal distribution. I have my
choice of equations (in Abramowitz and Stegun) of 26.2.10, 26.2.11, 26.2.14
or 26.2.13. By trials I choose 26.2.11, because it is faster. I find that it
fails to give acceptable results at larger x values, and I use 26.2.15 for
the higher region. The break point is arbitrary. The continuing fraction
solution has problems because of the way it is calculated. (3.10.1) and I
modify 3.10.1(2) to minimize error in the summations. The final thing is an
algorithm, and it includes arbitrary decisions. Some other developer doing
this would do it entirely different, such as using one or more of the many
rational approximations (because they are faster, and accuracy is not a
concern).

>
> Where you run into trouble is with "multicollinearity." This is a
> confusing term, but what it means is "almost but not quite collinear."
> Multicollinearity implies numerical ill-conditioning, which means
> that small changes in the data
............................................................................
....................
We use pertubation theory and analysis to determine the magnetude of the
problem. The fixes will be entirely different between developers.
............................................................................
...

can produce large changes in the
> parameter estimates. Singularity tests can be quite delicate, and
> if two different programs make different decisions about the rank
> of the design matrix, then they can get radically different estimates.
............................................................................
...........................................
If you have a matrix that is even close to being singular, you have to use
rank reduction techniques to get a reasonable answer. Each software package
that detects this obviously has different criteria and different rank
reduction algorithms, and determination of which variables to drop. There is
no uniform way. Look at all the different matrix methods in Fortran in
LINPAC, (and a bunch of others).
>
DAHeiser




===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================
Re: no correlation assumption among X's in MLR

Reply via email to