-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Beliavsky
Sent: Saturday, July 13, 2002 5:53 AM
To: [EMAIL PROTECTED]
Subject: why not Excel for regression?
In response to my question about a regression program, a few people
suggested
Excel -- with reservations. I do often use Excel for regressions, but a
command
line is my favorite GUI :). Could someone explain what is wrong with Excel's
regression function? Is this discussed on the web somewhere?
.
-------------------------------------------------------------------------
I have done an extensive analysis on Excel's regressions, including
evaluations with the NIST Strd tests.
One inherent problem is with all commercial stat packages, including Excel
is the limitations imposed by the IEEE 64 bit floating point number
representation. They all basically have limited performance. However, I can
get Excel to give more accurate answers than what comes out from some of the
commercial stat packages.
Excel matrix routines are fairly robust given the fact that the most likely
error is singularity. Banded data matrices do not usually occur. For Excel
using multivariate regression, the inversion routine is fairly robust
against moderate singularity.
Excel will only handle multivariate regressions with 16 or less independent
(X) variables.
Both the X (matrix) and Y (vector) data sets should be centered about the
means, and the centered data selected for the regressions. If your data is
only good to 2-3 significant figures, then the effort to center the data may
not justify the improved accuracy from the use of centered data.
Use Tools - Data Analysis - Regression to get the generally useful output.
The specific functions such as forecast, intercept, slope, steyx just show
single values. For the Data Analysis Regression routine you need to be
familiar with all the menu inputs.
Regression:
Input Y Range: A block defining the input Y data. A list of values.
Input X Range: A block defining the input X data. For a single
variable it
is a list, for multivariate data it is a contiguous block of
variables.
Labels: Does the first cell in the input data have a label? Applied to
both X and Y data blocks.
Constant is Zero: Check box if the regression is forced to go through
the
origin.
Confidence Level: Check box for +/- confidence level about the computed
coefficient values and put the desired level in the next box.
(Percentage).
Residuals: Boxes to be checked are:
Residuals: Check box if a column of residuals is wanted.
Standardized Residuals: Check if a column of standardized
residuals is
wanted.
Residual Plots: Check box if plot is desired
Line Fit Plots: Check box if plot is desired
Normal Probability Plots: Check box if plot is desired.
Output: Extensive, occupies 9 columns by 20 or more rows. Overwrites
any
cells. Include a row for each variable, 6 rows for any residual table
headers and a row of residual values for each data point
Attempting to try and use the Solver package to minimize the sum of squares
of the residuals will result in less accurate results. However for
non-linear regressions, it may be the only way. The Solver package uses
numerical first derivatives which does a poor job on solving regression
problems.
If you are trying to fit some non-linear equations, transform the data
first, then center, then fit, then translate coefficients to the
non-centered data, and then translate to the original data system. This of
course assumes that the residuals after transformation are approximately
normally distributed.
In your regression menu, do not check the "constant is zero" box. If you are
doing a regression through zero, I can give you way to avoid some to the
errors here.
Excel will not give you a plot of residuals versus "normal distribution" z
values to see if the residuals are near normally distributed if you check
the "Normal Probability Plots" box. It takes some additional calculations
and chart changes to get this.
Other than this, Excel will give you good results. The F dist results are
accurate for small tails to 1E-300. I haven't checked the t dist extreme
ranges yet.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================