Re: Applying Corresponding Regressions to automotive mileage data set

Gottfried Helms Wed, 18 Sep 2002 10:23:29 -0700

> "Simon, Steve, PhD" schrieb:
> 
> I just wrote an S-plus function that computes the statistical measures described in
> 
> "Inferring Formal Causation from Corresponding Regressions" William V. Chambers 
>http://www.wynja.com/chambers/regression.html
> 
> but please check my logic. Here is the language in the publication.
> 
> > The answer follows from the logic of least squares
> > regression. The prediction slope when predicting x1 from y
> > will tend to be oriented by the extremes of y. An extreme
> > value can lead to a relatively extreme error, especially
> > since least squares analysis weights errors by squaring
> > them. Thus, in order to reduce the overall error, the
> > extremes of y will orient the prediction slope when
> > predicting x1 from y. Consequently, absolute errors
> > predicting x1 from y should correlate negatively with the
> > absolute values of the deviations of the y values from
> > their mean, since absolute deviations reflect extremity.
> > Error will not be asymmetrically reduced across the levels
> > of x1 when x1 serves as the predictor of y. This is because
> > the correlation between x1 and error (x2) remains uniform
> > across both extremes and mid-ranges of x variables.
> > Therefore, rde(y), the correlation of the absolute errors
> > predicting x1 from y, should be more negative than rde(x),
> > the correlation between absolute deviations from x1 and
> > absolute errors predicting y. This difference in the
> > correlations of predictor extremities and regression
> > residuals is a reflection of the asymmetrical formal
> > relationships between IVs and DVs and is the basis of the
> > method of corresponding regressions.
> 
> > A summary statistic, D, was found by subtracting rde(x)
> > from rde(y).
> 
> This is the S-plus function
> 
> > corr.reg
> function(x, y)
> {
>         ey.x <- resid(lm(y ~ x))
>         ex.y <- resid(lm(x ~ y))
>         rde.y <- cor(abs(ex.y), abs(y - mean(y)))
>         rde.x <- cor(abs(ey.x), abs(x - mean(x)))
>         rde.y - rde.x
> }
> 
> Even if you don't use S-plus, you should be able to follow this. Note that the 
>syntax lm(y~x) fits a linear regression model with y as the dependent variable. The 
>rest of the syntax should be obvious.
> 
> I tried it with some random uniform variables as follows:
> 
> > x1 <- runif(50)
> > x2 <- runif(50)
> > y <- x1+x2
> > corr.reg(x1,y)
> [1] -0.7029148
> > corr.reg(x2,y)
> [1] -0.6784307
> > corr.reg(x1,x2)
> [1] -0.02448412
> 
> So far, so good. A large negative value implies that x causes y.
> A value close to zero implies that neither variable causes the other. 
> By inference, a large positive value implies that y causes x.
> 
> Then I tried it with a classic data set from Consumer Reports listed at
> 
> http://lib.stat.cmu.edu/DASL/Datafiles/Cars.html
> 
> > corr.reg(Weight,MPG)  [1] 0.4684389
> > corr.reg(Drv.Rat,MPG) [1] 0.2145594
> > corr.reg(HP,MPG)      [1] 0.1312272
> > corr.reg(Displ,MPG)   [1] 0.5197843
> > corr.reg(Cyl,MPG)     [1] 0.2893959
> 
> I think I have the algorithm coded properly, but Dr. Chambers or anyone else could 
>double check these results very easily.
>


Hmmm. Don't know why, I get completely different results. Compare that values with the
first row of the table of bivariate tests below:

                    MPG          Weight         Drive_Ratio      Horsepower     
Displacement    Cylinders
MPG                 0,00            0,08           -0,03            0,06            
0,04           -0,46
Weight             -0,08            0,00           -0,17           -0,04            
0,09           -0,08
Drive_Ratio         0,03            0,17            0,00            0,12            
0,17            0,16
Horsepower         -0,06            0,04           -0,12            0,00            
0,32           -0,21
Displacement       -0,04           -0,09           -0,17           -0,32            
0,00           -0,29
Cylinders           0,46            0,08           -0,16            0,21            
0,29            0,00

where negative signs indicate that the column-variable "causes" the row-variable.

But I'll check my routine, too. Btw, pssibly there is confusing of the order of
variables? Check my last column against yours...

Gottfried.



 y =V1 // first column of data
x1 =V2 // second column of data

 n = colcount(y)

  // regression
      A =  Y*X1'*inv(X1*X1') // regression coefficients into A
      Y_res = Y - A*X1       // store residual in Y_res

  X2 = zvalue(Y_res)  // the "partner" for equation Y = A*X1 + B*X2

  // Now use absolute values
  zX1 = zvalue(abs(X1))
  zX2 = zvalue(abs(X2))
  rde = skalar(zX1 * zX2') / n    // correlation between zX1 & zX2

  rde_v1 = rde  // output-parameter

;-------------------------------------------
  y = V2   // now test opposite direction
  x1= V1   // 
 
   // regression
      A =Y*X1'*m_inv(X1*X1')     // regression coefficients into A
      Y_res = Y - A*X1           // store residual in Y_res
 
  X2 = zvalue(Y_res)

  // Now use absolute values
  zX1 = zvalue(abs(X1))
  zX2 = zvalue(abs(X2))
  rde = skalar(zX1 * zX2') / n    // correlation between zX1 & zX2

  rde_v2 = rde   // output-parameter
;-------------------------------------------


 d = rde_v2 - rde_V1 

---------------------------------------
The value is the  entry(1,2) of the table above ~  0.078
indicating mpg -> weight and so on
.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Applying Corresponding Regressions to automotive mileage data set

Reply via email to