Re: Excel multiple regression equations...Help please!

GreekEagle Thu, 30 May 2002 14:55:58 -0700

I have to really thank you all for your time!

@ David Heiser.


Your remarks were very helpful and escpecially that one about stepwise
regression.I m working on them already!That procedure for the right
plot sounds interesting.I would really appreciate it if you could sent
it to me.Thanks again!

@ Frank Jarrel

Would you please tell me which software you mean or where i can find
it?My work till now was 100% in Excel and i never thought to use an
other software as i never actually needed a "powerful tool" for
statistics.It seems that i need it now!

@ Jerry Dizinno

Dont worry you didnt insult me...You are right that i m no expert in
statistics as i never had to work with so many data for regression
before.My essay does not depend so much in statistics but i need  some
general statistical indicators for my data, to go on with it.Cheers!





Frank E Harrell Jr <[EMAIL PROTECTED]> wrote in message 
news:<[EMAIL PROTECTED]>...
> If you have to center the data, then for sure Excel is using antiquated matrix 
>arithmetic.  Why rely on Excel when there is far better (and free, e.g., R) software 
>for doing this?  -Frank Harrell
> 
> On 29 May 2002 17:31:27 -0700
> [EMAIL PROTECTED] (David Heiser) wrote:
> 
> > 
> > 
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
> > Behalf Of GreekEagle
> > Sent: Wednesday, May 29, 2002 10:23 AM
> > To: [EMAIL PROTECTED]
> > Subject: Excel multiple regression equations...Help please!
> > ---------------------------------------------------------------
> > 1. Excel uses the standard matrix method here.
> > Goto any textbook that gives the matrix form for linear regression with
> > intercepts
> > The Data Analysis package just uses the existing Excel matrix functions.
> > These are:
> > **************************
> > Matrix Operations, Range Inputs, Single or Range Outputs
> > MDETERM &#8211; Returns the matrix determinant (as a single value) of a symmetric
> > array. All cells within the range must have a number.
> > {MINVERSE} &#8211; Returns the inverse matrix of a symmetric matrix. All cells
> > within the range must have a number.
> > {MMULT} &#8211; Returns the matrix product of two arrays. The two arrays must have
> > conformal rows and columns. Both arrays must have numbers in all cells.
> > {TRANSPOSE} &#8211; Returns the transpose of an array. Used to shift col-row
> > arrays to row-col form
> > ***************************************************************************
> > MULTIVARIATE REGRESSION APPLICATIONS
> > 
> > This includes polynomial fits of single variables, where the power terms are
> > generated as separate variables. Given the scope of Excel, it is best to
> > take a simple approach to multivariate regression. Excel lacks the tools to
> > properly evaluate these more complex fits. Also to be recognized is that the
> > more complex fits may fit the data very well within the range of the data,
> > but give totally wrong results when predictions are made using variable
> > values beyond the range of the generating data. This is especially true of
> > polynomial fits.
> > For polynomial fits it is best to center the X data, then derive the power
> > terms as additional variables from the centered X values.
> > 
> > For multivariate data, linear regression in LINEST is done by matrix
> > operations. The inverse of the (X&#8217;X) matrix is accurate for non-singular X
> > matrices to about 1E-14. It appears to be a straight forward LU
> > decomposition. For accuracy, it is best to center the data after any
> > transformations, since the X&#8217;X matrix is otherwise dominated by the squares
> > of the absolute values, and will result in inaccurate results.
> > ***************************************************************************
> > Notes:
> >     Be sure to center the data and use the centered data as the range of
> > inputs. Use the regression with intercepts to get the right R2 and t values.
> > DAH
> > ------------------------------------------------
> > ................. but i cant tell if
> > Excel is using it with no differences,correlations etc.Can you help me
> > please?Anything......
> > ------------------------------------------------
> > Numerical testing shows that Excel gives the right values here, except when
> > regression through the origin is selected, or when true singularity occurs
> > in the matrices. Excel does fine when the correlations between X variables
> > are as high as 0.999, as long as the data going in has been centered (i.e.
> > subtracted from the respective means)
> > DAH
> > ------------------------------------------------------------
> > 
> > -My final regression functions giving the Y values, have 5 or 6 (X)
> > different variables with 23 observations for each variable.My data are
> > strongly linear with determination (R Square) over 0,95.However the
> > t-Student values( for 95% precision)  for the variables' coefficents
> > are below the crtical "Student" value in many cases.I would like most
> > of the X variables to be statistically important in my Y functions.I
> > know it is impossible that all the X variables are statistically
> > important (with t-Student) but i would like them to be as more as
> > possible...I tried to improve that so I have tried to make some
> > combinations among the X variables different from those i originally
> > wanted to..i tried to increase the variables' observations with some
> > more data ...i transformed my data to logarithms and i tested
> > regression again but it didnt help much.Any ideas or tips for how i
> > can do it?Even a new data process method is acceptable!
> > 
> > Thanks for your help!Cheers!
> > ---------------------------------------------------------------
> > Excel is weak here, since it does not help you identify what is the problem.
> > One help is to do stepwise regression, looking at the residuals in terms of
> > a normal plot. Excel however in the Data Analysis package does not give the
> > right chart when the box is checked. You have to go through a procedure to
> > get the right plot. I can send it as a file if you are interested. These
> > residual plots may indicate that one or more points may be dominating the
> > solution, and trimming the data may give what you are looking for.
> > 
> > Here is a typical (partial) Excel output (from the centered Longley data).
> >          Coefficients          Standard Error    t Stat          P-value     Lower 
>95%
> > Upper 95%
> > Intercept   1.55227E-13          80.19964825   1.93551E-15
> >   1.000000000       -184.9408401    184.9408401
> > X1          17.69270551          89.65919182         0.197332868   0.848488565     
> -189.0618953
> > 224.4473063
> > X2          -0.035358222    0.035266279     -1.002607104   0.345408261      
>-0.116682459
> > 0.045966016
> > X3          -2.039028779    0.516631745     -3.94677408
> > .004255236  -3.230384489     -0.847673069
> > X4          -1.065810024    0.243234175     -4.381826788
> > 02343047    -1.6267094           -0.504910648
> > X5          -0.060888276    0.239469089     -0.254263614   0.805705356      
>-0.613105343
> > 0.491328791
> > X6        2354.192413          1546.080224   1.522684513   0.166337112 -1211.077282
> > 5919.462107
> > X7        -522.1544056     1461.828537      -0.357192648
> >   0.730186832 -3893.139236      2848.830425
> > 
> > The P-value represents the probability that the coefficient is truly zero.
> > The lower and upper 95% represent coefficient values that would include 95%
> > of all calculated coefficient values from calculations on additional data
> > sets taken as repetitions from a parent population. Most real data is like
> > this, showing a large uncertainty in the true value of a coefficient.
> > Although the R squared here is .9955, it clearly shows that any one of the
> > coefficients could be zero, but there is a probability of 0.0023 that all
> > the coefficients would be zero.
> > 
> > Also consider what field/discipline you are working in. Psychologists tend
> > to accept anything over 90% as being significant. Others it has to be over
> > 99%.
> >  DAHeiser
> > 
> > .
> > .
> > =================================================================
> > Instructions for joining and leaving this list, remarks about the
> > problem of INAPPROPRIATE MESSAGES, and archives are available at:
> > .                  http://jse.stat.ncsu.edu/                    .
> > =================================================================
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Excel multiple regression equations...Help please!

Reply via email to