Hi Gus,
major snip...
> My point is that it should not matter how you choose the sample. I am a
> bit worried about
> trimming off the tails. The effect should be observable no matter what.
> But I'll grant you
> that maybe trimming the tails may enhance the effect or make it more
> visible.
I strongly disagree with your assertion that it would make more sense if we
did not need to trim. When we use normally distributed causes x1 and x2,
there will be few if any incidents in which extreme x1s are paired with
similarly extreme x2s. There is nothing magical about this. The extremes
of y will tend to be a function of either extremes of either x1 or x2, not
both, since extremes of both together are so rare. This is what Bunge calls
disjunctive causation. CR should not work with such disjunctive data since
the rationale of CR is based on conjoint causation. It is perfectly logical
that CR would not work when x1 and x2 are normally distributed. With
disjoint causation, there is no unified causal mechanism. By trimming, we
increase the number of x1 and x2 values in the new extremes, increasing the
opportunities for the new x1 and x2 extremes to occur together.
>
> > It sounds to me that when you collect subsamples you are
> > selecting y values somehow so that you are building in
> > additional dependencies between the collected x values and
> > the y values.
> >
> This is impossible, like I said. When I construct y as the sum of x1 and
> x2, then y is the effect
> and x1 and x2 are the causes. This fact is not altered in the least by
> my decision to report
> only every tenth set of values, or every one hundredth, or any other
> subset. (At least in my
> definition of "cause". If you disagree on this point, then there is
> indeed no purpose in
> continuing.) Whether the causal effect is _visible_ or not is of course
> another matter.
If you simply counted every 5th or tenth value then you are collecting
uniform subsamples of a normal distribution. This will not work because you
are not allowing for coincidences of the extremes of x1 and x2. They are
still very rare and do not tend to occur together. Thus you are merely
subsampling uniformly from normal distributions! In doing so, you are not
filling out the corners of the cross tabulation of x1 and x2. There will
still not be data in which similar values of x1 and x2 are crossed in their
extremes. So we need to talk about what we mean by uniform distributions.
Having the merely the same number of observations across the levels of the
causes doesnot accomplish anything if we still do not have observations in
which extremes of x1 and x2 are paired. You sampling method prevents us from
seeing the conjoint production of y by the combination of extremes of both
x1 and x2 simultaneously. The point of using uniform variables is to get
observations where y is the product of extremes of both x1 and x2
simultaneously. When we trim data, we go down far enough into the
distributions of both x1 and x2 to allow for coincidences of extremes of
both x1 and x2. The way you are apparently doing it, all you are doing is
duplicating the normal distributions that prevent conjoint causation. The
result may look uniform but it is really a disguised summary of normal
distributions. Doing the sampling this way suggests that you do not SEE the
point of uniform sampling. It is not uniformity for uniformity's sake, it is
to get a sample of conjoint causation all along the ranges of the causes.
My rationale goes back to the notion of what I called manifolds in my last
publication and in hundreds of previous newslist postings. A manifold is
the complete crossing of two causes:
4x4 Additive Mainfold .............................................
............................................................................
...
x1 x2
y.................................................................
1 1
2.................................................................
1 2
3.................................................................
1 3
4.................................................................
1 4
5.................................................................
2 1
3.................................................................
2 2
4.................................................................
2 3
5.................................................................
2 4
6.................................................................
3 1
4.................................................................
3 2
5.................................................................
3 3
6.................................................................
3 4
7.................................................................
4 1
5.................................................................
4 2
6.................................................................
4 3
7.................................................................
4 4
8.................................................................
The manifold is the prototype of conjunctive causation. All levels of x1
and x2 are crossed. Manifolds will be approximated when the causes are
uniformly sampled and there are enough observations. Does your sampling
strategy produce manifolds? I doubt it.
Bill
> Maybe we should in fact get agreement on this point before I try to
> explain the
> sampling scheme to you again.
>
> (rest snipped.)
>
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================