Re: Excel multiple regression equations...Help please!

Frank E Harrell Jr Thu, 30 May 2002 06:26:09 -0700

If you have to center the data, then for sure Excel is using antiquated matrix 
arithmetic.  Why rely on Excel when there is far better (and free, e.g., R) software 
for doing this?  -Frank Harrell


On 29 May 2002 17:31:27 -0700
[EMAIL PROTECTED] (David Heiser) wrote:

> 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> Behalf Of GreekEagle
> Sent: Wednesday, May 29, 2002 10:23 AM
> To: [EMAIL PROTECTED]
> Subject: Excel multiple regression equations...Help please!
> ---------------------------------------------------------------
> 1. Excel uses the standard matrix method here.
> Goto any textbook that gives the matrix form for linear regression with
> intercepts
> The Data Analysis package just uses the existing Excel matrix functions.
> These are:
> **************************
> Matrix Operations, Range Inputs, Single or Range Outputs
> MDETERM � Returns the matrix determinant (as a single value) of a symmetric
> array. All cells within the range must have a number.
> {MINVERSE} � Returns the inverse matrix of a symmetric matrix. All cells
> within the range must have a number.
> {MMULT} � Returns the matrix product of two arrays. The two arrays must have
> conformal rows and columns. Both arrays must have numbers in all cells.
> {TRANSPOSE} � Returns the transpose of an array. Used to shift col-row
> arrays to row-col form
> ***************************************************************************
> MULTIVARIATE REGRESSION APPLICATIONS
> 
> This includes polynomial fits of single variables, where the power terms are
> generated as separate variables. Given the scope of Excel, it is best to
> take a simple approach to multivariate regression. Excel lacks the tools to
> properly evaluate these more complex fits. Also to be recognized is that the
> more complex fits may fit the data very well within the range of the data,
> but give totally wrong results when predictions are made using variable
> values beyond the range of the generating data. This is especially true of
> polynomial fits.
> For polynomial fits it is best to center the X data, then derive the power
> terms as additional variables from the centered X values.
> 
> For multivariate data, linear regression in LINEST is done by matrix
> operations. The inverse of the (X�X) matrix is accurate for non-singular X
> matrices to about 1E-14. It appears to be a straight forward LU
> decomposition. For accuracy, it is best to center the data after any
> transformations, since the X�X matrix is otherwise dominated by the squares
> of the absolute values, and will result in inaccurate results.
> ***************************************************************************
> Notes:
>       Be sure to center the data and use the centered data as the range of
> inputs. Use the regression with intercepts to get the right R2 and t values.
> DAH
> ------------------------------------------------
> ................. but i cant tell if
> Excel is using it with no differences,correlations etc.Can you help me
> please?Anything......
> ------------------------------------------------
> Numerical testing shows that Excel gives the right values here, except when
> regression through the origin is selected, or when true singularity occurs
> in the matrices. Excel does fine when the correlations between X variables
> are as high as 0.999, as long as the data going in has been centered (i.e.
> subtracted from the respective means)
> DAH
> ------------------------------------------------------------
> 
> -My final regression functions giving the Y values, have 5 or 6 (X)
> different variables with 23 observations for each variable.My data are
> strongly linear with determination (R Square) over 0,95.However the
> t-Student values( for 95% precision)  for the variables' coefficents
> are below the crtical "Student" value in many cases.I would like most
> of the X variables to be statistically important in my Y functions.I
> know it is impossible that all the X variables are statistically
> important (with t-Student) but i would like them to be as more as
> possible...I tried to improve that so I have tried to make some
> combinations among the X variables different from those i originally
> wanted to..i tried to increase the variables' observations with some
> more data ...i transformed my data to logarithms and i tested
> regression again but it didnt help much.Any ideas or tips for how i
> can do it?Even a new data process method is acceptable!
> 
> Thanks for your help!Cheers!
> ---------------------------------------------------------------
> Excel is weak here, since it does not help you identify what is the problem.
> One help is to do stepwise regression, looking at the residuals in terms of
> a normal plot. Excel however in the Data Analysis package does not give the
> right chart when the box is checked. You have to go through a procedure to
> get the right plot. I can send it as a file if you are interested. These
> residual plots may indicate that one or more points may be dominating the
> solution, and trimming the data may give what you are looking for.
> 
> Here is a typical (partial) Excel output (from the centered Longley data).
>            Coefficients          Standard Error    t Stat          P-value     Lower 
>95%
> Upper 95%
> Intercept     1.55227E-13          80.19964825   1.93551E-15
>   1.000000000 -184.9408401    184.9408401
> X1            17.69270551          89.65919182         0.197332868   0.848488565     
> -189.0618953
> 224.4473063
> X2            -0.035358222    0.035266279     -1.002607104   0.345408261      
>-0.116682459
> 0.045966016
> X3            -2.039028779    0.516631745     -3.94677408
> .004255236    -3.230384489     -0.847673069
> X4            -1.065810024    0.243234175     -4.381826788
> 02343047      -1.6267094           -0.504910648
> X5            -0.060888276    0.239469089     -0.254263614   0.805705356      
>-0.613105343
> 0.491328791
> X6        2354.192413    1546.080224   1.522684513   0.166337112 -1211.077282
> 5919.462107
> X7        -522.1544056     1461.828537        -0.357192648
>   0.730186832 -3893.139236      2848.830425
> 
> The P-value represents the probability that the coefficient is truly zero.
> The lower and upper 95% represent coefficient values that would include 95%
> of all calculated coefficient values from calculations on additional data
> sets taken as repetitions from a parent population. Most real data is like
> this, showing a large uncertainty in the true value of a coefficient.
> Although the R squared here is .9955, it clearly shows that any one of the
> coefficients could be zero, but there is a probability of 0.0023 that all
> the coefficients would be zero.
> 
> Also consider what field/discipline you are working in. Psychologists tend
> to accept anything over 90% as being significant. Others it has to be over
> 99%.
>  DAHeiser
> 
> .
> .
> =================================================================
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at:
> .                  http://jse.stat.ncsu.edu/                    .
> =================================================================


-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Excel multiple regression equations...Help please!

Reply via email to