> "Simon, Steve, PhD" schrieb:
>
> I just wrote an S-plus function that computes the statistical measures described in
>
> "Inferring Formal Causation from Corresponding Regressions" William V. Chambers
>http://www.wynja.com/chambers/regression.html
>
> but please check my logic. Here is the language in the publication.
>
> > The answer follows from the logic of least squares
> > regression. The prediction slope when predicting x1 from y
> > will tend to be oriented by the extremes of y. An extreme
> > value can lead to a relatively extreme error, especially
> > since least squares analysis weights errors by squaring
> > them. Thus, in order to reduce the overall error, the
> > extremes of y will orient the prediction slope when
> > predicting x1 from y. Consequently, absolute errors
> > predicting x1 from y should correlate negatively with the
> > absolute values of the deviations of the y values from
> > their mean, since absolute deviations reflect extremity.
> > Error will not be asymmetrically reduced across the levels
> > of x1 when x1 serves as the predictor of y. This is because
> > the correlation between x1 and error (x2) remains uniform
> > across both extremes and mid-ranges of x variables.
> > Therefore, rde(y), the correlation of the absolute errors
> > predicting x1 from y, should be more negative than rde(x),
> > the correlation between absolute deviations from x1 and
> > absolute errors predicting y. This difference in the
> > correlations of predictor extremities and regression
> > residuals is a reflection of the asymmetrical formal
> > relationships between IVs and DVs and is the basis of the
> > method of corresponding regressions.
>
> > A summary statistic, D, was found by subtracting rde(x)
> > from rde(y).
>
> This is the S-plus function
>
> > corr.reg
> function(x, y)
> {
> ey.x <- resid(lm(y ~ x))
> ex.y <- resid(lm(x ~ y))
> rde.y <- cor(abs(ex.y), abs(y - mean(y)))
> rde.x <- cor(abs(ey.x), abs(x - mean(x)))
> rde.y - rde.x
> }
>
> Even if you don't use S-plus, you should be able to follow this. Note that the
>syntax lm(y~x) fits a linear regression model with y as the dependent variable. The
>rest of the syntax should be obvious.
>
> I tried it with some random uniform variables as follows:
>
> > x1 <- runif(50)
> > x2 <- runif(50)
> > y <- x1+x2
> > corr.reg(x1,y)
> [1] -0.7029148
> > corr.reg(x2,y)
> [1] -0.6784307
> > corr.reg(x1,x2)
> [1] -0.02448412
>
> So far, so good. A large negative value implies that x causes y.
> A value close to zero implies that neither variable causes the other.
> By inference, a large positive value implies that y causes x.
>
> Then I tried it with a classic data set from Consumer Reports listed at
>
> http://lib.stat.cmu.edu/DASL/Datafiles/Cars.html
>
> > corr.reg(Weight,MPG) [1] 0.4684389
> > corr.reg(Drv.Rat,MPG) [1] 0.2145594
> > corr.reg(HP,MPG) [1] 0.1312272
> > corr.reg(Displ,MPG) [1] 0.5197843
> > corr.reg(Cyl,MPG) [1] 0.2893959
>
> I think I have the algorithm coded properly, but Dr. Chambers or anyone else could
>double check these results very easily.
>
Hmmm. Don't know why, I get completely different results. Compare that values with the
first row of the table of bivariate tests below:
MPG Weight Drive_Ratio Horsepower
Displacement Cylinders
MPG 0,00 0,08 -0,03 0,06
0,04 -0,46
Weight -0,08 0,00 -0,17 -0,04
0,09 -0,08
Drive_Ratio 0,03 0,17 0,00 0,12
0,17 0,16
Horsepower -0,06 0,04 -0,12 0,00
0,32 -0,21
Displacement -0,04 -0,09 -0,17 -0,32
0,00 -0,29
Cylinders 0,46 0,08 -0,16 0,21
0,29 0,00
where negative signs indicate that the column-variable "causes" the row-variable.
But I'll check my routine, too. Btw, pssibly there is confusing of the order of
variables? Check my last column against yours...
Gottfried.
y =V1 // first column of data
x1 =V2 // second column of data
n = colcount(y)
// regression
A = Y*X1'*inv(X1*X1') // regression coefficients into A
Y_res = Y - A*X1 // store residual in Y_res
X2 = zvalue(Y_res) // the "partner" for equation Y = A*X1 + B*X2
// Now use absolute values
zX1 = zvalue(abs(X1))
zX2 = zvalue(abs(X2))
rde = skalar(zX1 * zX2') / n // correlation between zX1 & zX2
rde_v1 = rde // output-parameter
;-------------------------------------------
y = V2 // now test opposite direction
x1= V1 //
// regression
A =Y*X1'*m_inv(X1*X1') // regression coefficients into A
Y_res = Y - A*X1 // store residual in Y_res
X2 = zvalue(Y_res)
// Now use absolute values
zX1 = zvalue(abs(X1))
zX2 = zvalue(abs(X2))
rde = skalar(zX1 * zX2') / n // correlation between zX1 & zX2
rde_v2 = rde // output-parameter
;-------------------------------------------
d = rde_v2 - rde_V1
---------------------------------------
The value is the entry(1,2) of the table above ~ 0.078
indicating mpg -> weight and so on
.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================