----- Original Message ----- 
From: Herman Rubin <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, November 24, 1999 10:07 AM
Subject: Re: Need to evaluate difference between two R's


| In article <[EMAIL PROTECTED]>,
| Rich Ulrich  <[EMAIL PROTECTED]> wrote:
| >On Tue, 23 Nov 1999 04:39:28 GMT, [EMAIL PROTECTED] wrote:
| 
| >> Does any one know how one might test for significant differences
| >> between two multiple R's (or R squar's)generated from two sets of data?
| >> I need to determine if two R's generated on two separate occasions
| >> using the same DV and IV's differ significantly from one another.
| 
| >Correlations are not very good candidates for comparisons, since it is
| >so easy to do tests that are more precise.
| > - to test whether the predictive relations are different, you would
| >test the regressions -- do a Chow test or the equivalent, to see if a
| >different set of regressors are needed for a different sampling.
| > - to test whether the variances are different (which is something
| >that would change the correlations), you might test variances
| >directly.
| 
| This is correct.  In fact, it is generally the case that
| correlations, except as measures of how well the model
| fits, do not have any real meaning.
| 
| Even the amount of the variance explained can change
| drastically with a change in design, but the parameters of
| the model do not change, if normalizations are not done.
| For example, if one has a "normal" model with correlation
| coefficient .5, 25% of the variance is explained.  Now 
| suppose that the predictor variable is selected to be
| 2 standard deviations away from the mean, equally likely
| to be in either direction.  Then the correlation becomes
| .756, and the proportion of the variance explained goes
| up to 57%.  But the prediction model is still the same.
| -- 
| This address is for information only.  I do not claim that these views
| are those of the Statistics Department or of Purdue University.
| Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
| [EMAIL PROTECTED]         Phone: (765)494-6054   FAX: (765)494-0558
| 
------------------------------------------------------ 
Herman --

Great comment!

Discussions about correlation coefficients arise
periodically on various lists. So when the time seems 
appropriate I resend an old message (see below and the WORD 
attachment) that might be of interest.

IMHO their is too much time spent on the correlation coefficient
since it is of limited and sometimes misleading value
for practical decision-making in the real world.  However,
there are still some folks who are adjusting correlation
coefficients for "restriction of range" in hopes that it
might be useful.

-- Joe
*************************************************************  
Joe Ward                           Health Careers High School 
167 East Arrowhead Dr              4646 Hamilton Wolfe           
San Antonio, TX 78228-2402         San Antonio, TX 78229      
Phone:  210-433-6575               Phone: 210-617-5400        
Fax: 210-433-2828                  Fax: 210-617-5423             
[EMAIL PROTECTED]            
http://www.ijoa.org/joeward/wardindex.html                                   
************************************************************* 



---------- Forwarded message ----------
Date: Fri, 23 May 1997 09:30:20 -0400 (EDT)
From: Mike Palij <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: Testing basic statistical concepts

I'd like to thank Joe Ward for reminding us of this situation
(his posting is appended below), as well as jogging my own
memory for a previous posting I had made.  A while back I
had posted the Anscombe dataset (in the context of an SPSS
program) which also clearly shows the benefit of plotting
the data:  the four situations produce almost identical
Pearson r values but only one actually shows the classic
scatterplot, the others show a nonlinear pattern and the
influence that a single point has on the calculation of r.
What does the value of r tell us here?  Aren't the basic 
statistical concepts to be learned in this situation far 
more important and most clearly seen through a coordination
of the graphical and numerical information?

-Mike Palij/Psychology Dept/New York University

Joe H Ward <[EMAIL PROTECTED]> writes:
 To Mike et al --
 
 There have been several message related to the Simple Correlation
 Coefficient.  IMHO, when out in the "real world" involving practical
 decision-making the correlation coefficient has very limited value and
 sometimes dangerous consequences.  The correlation coefficient may be
 an important topic for the history of statistics to learn the problems 
 associated with its use . 
 
 Attached below is an item that I submitted a long time ago, and it may be 
 of interest to those following the discussion of "r".
 
 -- Joe
 ***********************************************************************
 * Joe Ward                                Health Careers High School  *
 * 167 East Arrowhead Dr.        4646 Hamilton Wolfe       *
 * San Antonio, TX 78228-2402              San Antonio, TX 78229       *
 * Phone: 210-433-6575           Phone: 210-617-5400         *
 * [EMAIL PROTECTED]             Fax  : 210-617-5423         *
 ***********************************************************************
 
 NON-RANDOM SAMPLING AND REGRESSION 
  
    -- PROVIDED (MANY YEARS AGO) BY
  JACK SCHMID, UNIV. OF NORTHERN COLORADO, GREELEY, COLORADO
 
 y from (MU=0, SIGMA = 1.25)
 x from (MU=0, SIGMA = 1.00)
 RHOxy = .60
 
 Sample 10,000 cases at each level of progressive TRUNCATION ON x.
 
 Regression equation:  y = bx + a
              _     _
 %Remaining   y     x    sigmay sigmax   r=BETA     b    a    Syx  
 ________________________________________________________________
 100%        .01   .02    1.25   1.00     .60      .75  -.01  1.00
  90%       -.15  -.19    1.18    .85     .53      .74  -.01  1.00
  80%       -.27  -.35    1.15    .76     .49      .74  -.01  1.00
  70%       -.38  -.50    1.13    .70     .45      .73  -.02  1.01
  60%       -.49  -.65    1.11    .64     .42      .72  -.02  1.01 
  50%       -.59  -.80    1.10    .59     .40      .74   .00  1.01
  40%       -.71  -.96    1.09    .55     .38      .76   .03  1.01
  30%       -.84 -1.15    1.08    .51     .36      .77   .04  1.01
  20%      -1.03 -1.39    1.06    .46     .33      .77   .03  1.00
  10%      -1.32 -1.75    1.04    .39     .28      .75  -.01  1.00
 
 ******* Students (or anyone who uses CORRELATION COEFFICIENTS) can observe
 that a correlation value can be made to have almost any value by
 "carefully" selecting the data or using data that has been truncated! 
 However, the regression coefficient, b (slope of the line) and Syx are
 more stable under various restrictions on the data. 
 -------------------------------


schmid.doc

Reply via email to