My Apologies -- "Haste makes waste!"

Notice the serious errors in the previous version!!!

The model (1) below should have read:

    (1)    Y = a1*U + a2*X + E  
 
and not

    (1)    Y = a1*U + a1*X + E  

and where there were statements about the
hypothesis "a1=0" it should read "a2=0"

:-(

-- Joe

----- Original Message ----- 
From: Joe Ward <[EMAIL PROTECTED]>
To: bkamen <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Cc: APSTAT-L <[EMAIL PROTECTED]>
Sent: Tuesday, January 04, 2000 3:08 PM
Subject: Re: Correlation - Constraints on Variables


In the beginning all information is BINARY/CATEGORICAL  (not DUMMY).

I refer to models of the very general form:

Y = a1*X1 + a2*X2 + a3*X3 + ... + ap*Xp + E

as Prediction/Regression/Linear Models.

The predictors X1, X2, X3, ...,Xp can be defined in many ways.
a1, a2, a3,...,ap are usually least-squares coefficients that
MINIMIZE THE SUM OF SQUARES OF THE ELEMENTS 
OF THE "Error" 'E'.

If the model is of the form:

    (1)    Y = a1*U + a1*X + E
   CORRECTION: SHOULD HAVE READ
    (1)    Y = a1*U + a2*X + E   
where
Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure)
U = a predictor with every element equal 1 
X = a continuous variable, e.g. Age, Height, Weight, Test Score
E = "error" or sometime called "residual" 

then the model is sometimes called "simple regression".

In this form, a test of the Hypothesis a2=0 is sometimes called a
test of "ZERO CORRELATION" or "SLOPE = 0".

Now consider Model 1 as above:

  (1)  Y = a1*U + a1*X + E 
 CORRECTION: THIS SHOULD HAVE READ
  (1)    Y = a1*U + a2*X + E  

and we let
Y = a dependent variable, usually "continuous" (Mile run time, Blood pressure)
U = a predictor with every element equal 1 
(as above)
but
X = 1 if the Y observation is from a Male; 0 if the Y observation is from a Female.

In this model, a test of the Hypothesis a2=0 is sometime called a 
test of the hypothesis that the
 Expected Value of Y (Mean) for Males = Expected Value of Y (Mean) for Females
or 
a "t-test for the difference between two means".

Other special forms of the GENERAL MODEL are called different names, such as 
One-way Analysis of Variance (ANOVA), Analysis of Covariance, Two-way Analysis of 
Variance, etc.

Before we acquired high-speed computers, we needed special easy-to-calculate 
computational
procedures.   WE SHOULD NOT BE CONSTRAINED NOW THAT WE HAVE THE COMPUTER POWER.

Many seemingly-different algorithms of statistics can be accomplished under ONE 
GENERAL FORM.

But of most importance, the ONE GENERAL FORM can be used to create models that fit
unique research questions.

The items contained in the URL shown below are related to your question.

If you would like to see some detailed examples, you may want to look in a university
library at:
Introduction to Linear Models by Ward & Jennings,Prentice-Hall, 1973.

Copies of this book are available from the 

Institute for Job and Occupation Analysis:

Jimmy L. Mitchell, Ph.D., Director 
[EMAIL PROTECTED]
10010 San Pedro, Suite 440, San Antonio, Texas 78216
(210) 349-8525   Fax: (210) 349-0168


--- Joe
************************************************************************ 
* Joe Ward                                  Health Careers High School *
* 167 East Arrowhead Dr                     4646 Hamilton Wolfe        *
* San Antonio, TX 78228-2402                San Antonio, TX 78229      *
* Phone: 210-433-6575                       Phone: 210-617-5400        *
* Fax: 210-433-2828                         Fax: 210-617-5423          *
* [EMAIL PROTECTED]                                                    *
* http://www.ijoa.org/joeward/wardindex.html                           *
************************************************************************


 






----- Original Message ----- 
From: bkamen <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, January 02, 2000 12:17 PM
Subject: Correlation - Constraints on Variables


| This is a multi-part message in MIME format.
| 
| ------=_NextPart_000_0054_01BF551B.4D1409A0
| Content-Type: text/plain;
| charset="iso-8859-1"
| Content-Transfer-Encoding: quoted-printable
| 
| This practical question arose between myself and a colleague at work.  =
| It concerns whether we can use correlation analysis if one of the =
| variables is non-continuous or "categorical."  She believes that both =
| variables must be continuous.  However she cannot say why, and I cannot =
| find any such constraint in the statistics book I have relied on since =
| graduating in Industrial Engineering a few years ago, Miller and Freund, =
| 'Probability and Statistics for Engineers.' =20
| 
| I have been thinking that if x is discrete and can assume only a few =
| values compared with y which is continuous, the correlation study may =
| yield a high probability of type-one error.  I interpret this as =
| providing insufficient evidence with which to reject the null =
| hypothesis.  But I have not thought of this as an inappropriate use of =
| correlation. =20
| 
| On the other hand in attempting to probe Miller and Freund I find that =
| correlation is based on the "bivariate normal distribution,"  the =
| formula for which has numerous parameters including alpha and beta, the =
| least squares regression coefficients.  I am aware that to obtain the =
| latter requires that the function be differentiable, hence x must also =
| be continuous.  This seems to support my friend's view.
| 
| I would appreciate clarification of any such constraints on the =
| practical use of correlation analysis.  Also, if anyone can recommend a =
| textbook that addresses questions such as this more directly than Miller =
| and Freund, I would appreciate that also.
| 
| 
| ------=_NextPart_000_0054_01BF551B.4D1409A0
| Content-Type: text/html;
| charset="iso-8859-1"
| Content-Transfer-Encoding: quoted-printable
| 
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
| <HTML><HEAD>
| <META content=3D"text/html; charset=3Diso-8859-1" =
| http-equiv=3DContent-Type>
| <META content=3D"MSHTML 5.00.2614.3500" name=3DGENERATOR>
| <STYLE></STYLE>
| </HEAD>
| <BODY bgColor=3D#ffffff>
| <DIV><FONT size=3D2>This practical question&nbsp;arose between myself =
| and a=20
| colleague at work.&nbsp; It concerns whether we can use correlation=20
| analysis&nbsp;if one of the variables&nbsp;is non-continuous or=20
| "categorical."&nbsp; </FONT><FONT size=3D2>She believes that both=20
| variables&nbsp;must be continuous.&nbsp;&nbsp;However she&nbsp;cannot =
| say=20
| why,&nbsp;and I cannot find any such constraint in&nbsp;the statistics =
| book I=20
| have relied on since graduating in Industrial Engineering a few years =
| ago,=20
| Miller and Freund, 'Probability and Statistics for Engineers.'&nbsp;=20
| </FONT></DIV>
| <DIV>&nbsp;</DIV>
| <DIV><FONT size=3D2>I have been&nbsp;thinking that if x is discrete and=20
| can&nbsp;assume only a few values compared with y which is continuous, =
| the=20
| correlation study may yield a high probability of type-one error.&nbsp; =
| I=20
| interpret this as&nbsp;providing insufficient evidence with which to =
| reject the=20
| null hypothesis.&nbsp; But I&nbsp;have not thought of this as&nbsp;an=20
| inappropriate use of correlation.&nbsp; </FONT></DIV>
| <DIV>&nbsp;</DIV>
| <DIV><FONT size=3D2><FONT size=3D2>On the other hand&nbsp;in attempting =
| to probe=20
| Miller and Freund I find&nbsp;that&nbsp;correlation is based on the =
| "bivariate=20
| normal distribution,"&nbsp; the formula for which has numerous =
| parameters=20
| including alpha and beta, the least squares regression=20
| coefficients.&nbsp;&nbsp;I am aware that to obtain the latter requires =
| that=20
| the&nbsp;function&nbsp;be differentiable, hence x must also=20
| be&nbsp;continuous.&nbsp; This seems to support&nbsp;my friend's=20
| view.</FONT></FONT></DIV>
| <DIV>&nbsp;</DIV>
| <DIV><FONT size=3D2>I would appreciate clarification of any such =
| constraints on=20
| the practical use of correlation analysis.&nbsp; Also, if anyone can =
| recommend a=20
| textbook that addresses questions such as this more directly&nbsp;than =
| Miller=20
| and Freund, I would appreciate that also.</FONT></DIV>
| <DIV>&nbsp;</DIV></BODY></HTML>
| 
| ------=_NextPart_000_0054_01BF551B.4D1409A0--
| 
| 

Reply via email to