[EMAIL PROTECTED] wrote:
> Hi Gus,
>
> major snip...
>
> >
> > > It sounds to me that when you collect subsamples you are
> > > selecting y values somehow so that you are building in
> > > additional dependencies between the collected x values and
> > > the y values.
> > >
> > This is impossible, like I said. When I construct y as the sum of x1 and
> > x2, then y is the effect
> > and x1 and x2 are the causes. This fact is not altered in the least by
> > my decision to report
> > only every tenth set of values, or every one hundredth, or any other
> > subset. (At least in my
> > definition of "cause". If you disagree on this point, then there is
> > indeed no purpose in
> > continuing.) Whether the causal effect is _visible_ or not is of course
> > another matter.
>
> If you simply counted every 5th or tenth value then you are collecting
> uniform subsamples of a normal distribution. This will not work because you
> are not allowing for coincidences of the extremes of x1 and x2. They are
> still very rare and do not tend to occur together. Thus you are merely
> subsampling uniformly from normal distributions! In doing so, you are not
> filling out the corners of the cross tabulation of x1 and x2. There will
> still not be data in which similar values of x1 and x2 are crossed in their
> extremes. So we need to talk about what we mean by uniform distributions.
That is not what I meant, so let's back up. What I mean by a uniform random
variable in one dimension is something that has the probability density
1/(b-a) I_{b-a}(x), that is, the probability that a realization of this random
variable falls into any subinterval of [a,b] of length delta depends only on
delta,
not on the endpoints of the subinterval. The excel function rand() spits out
such uniformly distributed random variables (on [0,1]).
Let's say I collect a sample of size 100 from the two uniformly distributed
random variables x1 and x2 and I compute y = x1 + x2. In this sample y is
caused by x1 and x2, the way I understand it. (Do you agree?)
I don't want to give the entire sample here, but let's say it looks like this:
Row x1 x2 y
1 0.47 0.15 0.62
2 0.71 0.43 1.14
3 0.77 0.87 1.64
...
100 0.50 0.74 1.24
Now suppose I take a subsample from this, for instance,
I select rows 2, 7, 16, 33, 39, 54, 66, 71, 90, 99.
In this subsample, is y still caused by x1 and x2 or not?
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================