Gus, Does the procedure you use fill out the corners of a cross tabulation of x1 and x2? Are the intervals or equal width?
Bill "Gus Gassmann" <[EMAIL PROTECTED]> wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... > > > [EMAIL PROTECTED] wrote: > > > Hi Gus, > > > > major snip... > > > > > > > > > It sounds to me that when you collect subsamples you are > > > > selecting y values somehow so that you are building in > > > > additional dependencies between the collected x values and > > > > the y values. > > > > > > > This is impossible, like I said. When I construct y as the sum of x1 and > > > x2, then y is the effect > > > and x1 and x2 are the causes. This fact is not altered in the least by > > > my decision to report > > > only every tenth set of values, or every one hundredth, or any other > > > subset. (At least in my > > > definition of "cause". If you disagree on this point, then there is > > > indeed no purpose in > > > continuing.) Whether the causal effect is _visible_ or not is of course > > > another matter. > > > > If you simply counted every 5th or tenth value then you are collecting > > uniform subsamples of a normal distribution. This will not work because you > > are not allowing for coincidences of the extremes of x1 and x2. They are > > still very rare and do not tend to occur together. Thus you are merely > > subsampling uniformly from normal distributions! In doing so, you are not > > filling out the corners of the cross tabulation of x1 and x2. There will > > still not be data in which similar values of x1 and x2 are crossed in their > > extremes. So we need to talk about what we mean by uniform distributions. > > That is not what I meant, so let's back up. What I mean by a uniform random > variable in one dimension is something that has the probability density > 1/(b-a) I_{b-a}(x), that is, the probability that a realization of this random > variable falls into any subinterval of [a,b] of length delta depends only on > delta, > not on the endpoints of the subinterval. The excel function rand() spits out > such uniformly distributed random variables (on [0,1]). > > Let's say I collect a sample of size 100 from the two uniformly distributed > random variables x1 and x2 and I compute y = x1 + x2. In this sample y is > caused by x1 and x2, the way I understand it. (Do you agree?) > > I don't want to give the entire sample here, but let's say it looks like this: > > Row x1 x2 y > 1 0.47 0.15 0.62 > 2 0.71 0.43 1.14 > 3 0.77 0.87 1.64 > ... > 100 0.50 0.74 1.24 > > Now suppose I take a subsample from this, for instance, > I select rows 2, 7, 16, 33, 39, 54, 66, 71, 90, 99. > In this subsample, is y still caused by x1 and x2 or not? > > > > > . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
