-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of GreekEagle
Sent: Wednesday, May 29, 2002 10:23 AM
To: [EMAIL PROTECTED]
Subject: Excel multiple regression equations...Help please!
---------------------------------------------------------------
1. Excel uses the standard matrix method here.
Goto any textbook that gives the matrix form for linear regression with
intercepts
The Data Analysis package just uses the existing Excel matrix functions.
These are:
**************************
Matrix Operations, Range Inputs, Single or Range Outputs
MDETERM � Returns the matrix determinant (as a single value) of a symmetric
array. All cells within the range must have a number.
{MINVERSE} � Returns the inverse matrix of a symmetric matrix. All cells
within the range must have a number.
{MMULT} � Returns the matrix product of two arrays. The two arrays must have
conformal rows and columns. Both arrays must have numbers in all cells.
{TRANSPOSE} � Returns the transpose of an array. Used to shift col-row
arrays to row-col form
***************************************************************************
MULTIVARIATE REGRESSION APPLICATIONS
This includes polynomial fits of single variables, where the power terms are
generated as separate variables. Given the scope of Excel, it is best to
take a simple approach to multivariate regression. Excel lacks the tools to
properly evaluate these more complex fits. Also to be recognized is that the
more complex fits may fit the data very well within the range of the data,
but give totally wrong results when predictions are made using variable
values beyond the range of the generating data. This is especially true of
polynomial fits.
For polynomial fits it is best to center the X data, then derive the power
terms as additional variables from the centered X values.
For multivariate data, linear regression in LINEST is done by matrix
operations. The inverse of the (X�X) matrix is accurate for non-singular X
matrices to about 1E-14. It appears to be a straight forward LU
decomposition. For accuracy, it is best to center the data after any
transformations, since the X�X matrix is otherwise dominated by the squares
of the absolute values, and will result in inaccurate results.
***************************************************************************
Notes:
Be sure to center the data and use the centered data as the range of
inputs. Use the regression with intercepts to get the right R2 and t values.
DAH
------------------------------------------------
................. but i cant tell if
Excel is using it with no differences,correlations etc.Can you help me
please?Anything......
------------------------------------------------
Numerical testing shows that Excel gives the right values here, except when
regression through the origin is selected, or when true singularity occurs
in the matrices. Excel does fine when the correlations between X variables
are as high as 0.999, as long as the data going in has been centered (i.e.
subtracted from the respective means)
DAH
------------------------------------------------------------
-My final regression functions giving the Y values, have 5 or 6 (X)
different variables with 23 observations for each variable.My data are
strongly linear with determination (R Square) over 0,95.However the
t-Student values( for 95% precision) for the variables' coefficents
are below the crtical "Student" value in many cases.I would like most
of the X variables to be statistically important in my Y functions.I
know it is impossible that all the X variables are statistically
important (with t-Student) but i would like them to be as more as
possible...I tried to improve that so I have tried to make some
combinations among the X variables different from those i originally
wanted to..i tried to increase the variables' observations with some
more data ...i transformed my data to logarithms and i tested
regression again but it didnt help much.Any ideas or tips for how i
can do it?Even a new data process method is acceptable!
Thanks for your help!Cheers!
---------------------------------------------------------------
Excel is weak here, since it does not help you identify what is the problem.
One help is to do stepwise regression, looking at the residuals in terms of
a normal plot. Excel however in the Data Analysis package does not give the
right chart when the box is checked. You have to go through a procedure to
get the right plot. I can send it as a file if you are interested. These
residual plots may indicate that one or more points may be dominating the
solution, and trimming the data may give what you are looking for.
Here is a typical (partial) Excel output (from the centered Longley data).
Coefficients Standard Error t Stat P-value Lower
95%
Upper 95%
Intercept 1.55227E-13 80.19964825 1.93551E-15
1.000000000 -184.9408401 184.9408401
X1 17.69270551 89.65919182 0.197332868 0.848488565
-189.0618953
224.4473063
X2 -0.035358222 0.035266279 -1.002607104 0.345408261 -0.116682459
0.045966016
X3 -2.039028779 0.516631745 -3.94677408
.004255236 -3.230384489 -0.847673069
X4 -1.065810024 0.243234175 -4.381826788
02343047 -1.6267094 -0.504910648
X5 -0.060888276 0.239469089 -0.254263614 0.805705356 -0.613105343
0.491328791
X6 2354.192413 1546.080224 1.522684513 0.166337112 -1211.077282
5919.462107
X7 -522.1544056 1461.828537 -0.357192648
0.730186832 -3893.139236 2848.830425
The P-value represents the probability that the coefficient is truly zero.
The lower and upper 95% represent coefficient values that would include 95%
of all calculated coefficient values from calculations on additional data
sets taken as repetitions from a parent population. Most real data is like
this, showing a large uncertainty in the true value of a coefficient.
Although the R squared here is .9955, it clearly shows that any one of the
coefficients could be zero, but there is a probability of 0.0023 that all
the coefficients would be zero.
Also consider what field/discipline you are working in. Psychologists tend
to accept anything over 90% as being significant. Others it has to be over
99%.
DAHeiser
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================