Re: Regression CIs (was: Normally Distributed ANOVA FACTORS?)

Gus Gassmann Sun, 29 Sep 2002 17:15:11 -0700

[EMAIL PROTECTED] wrote:

> Hi Gus,
>
> major snip...
>
> >
> > >       It sounds to me that when you collect subsamples you are
> > >      selecting y values somehow so that you are building in
> > >      additional dependencies between the collected x values and
> > >      the y values.
> > >
> > This is impossible, like I said. When I construct y as the sum of x1 and
> > x2, then y is the effect
> > and x1 and x2 are the causes. This fact is not altered in the least by
> > my decision to report
> > only every tenth set of values, or every one hundredth, or any other
> > subset. (At least in my
> > definition of "cause". If you disagree on this point, then there is
> > indeed no purpose in
> > continuing.) Whether the causal effect is _visible_ or not is of course
> > another matter.
>
> If you simply counted every 5th or tenth value then you are collecting
> uniform subsamples of a normal distribution. This will not work because you
> are not allowing for coincidences of the extremes of x1 and x2. They are
> still very rare and do not tend to occur together. Thus you are merely
> subsampling uniformly from normal distributions! In doing so, you are not
> filling out the corners of the cross tabulation of x1 and x2. There will
> still not be data in which similar values of x1 and x2 are crossed in their
> extremes. So we need to talk about what we mean by uniform distributions.

That is not what I meant, so let's back up. What I mean by a uniform random
variable in one dimension is something that has the probability density
1/(b-a) I_{b-a}(x), that is, the probability that a realization of this random
variable falls into any subinterval of [a,b] of length delta depends only on
delta,
not on the endpoints of the subinterval. The excel function rand() spits out
such uniformly distributed random variables (on [0,1]).

Let's say I collect a sample of size 100 from the two uniformly distributed
random variables x1 and x2 and I compute y = x1 + x2. In this sample y is
caused by x1 and x2, the way I understand it. (Do you agree?)

I don't want to give the entire sample here, but let's say it looks like this:

Row     x1      x2      y
  1       0.47   0.15   0.62
  2       0.71   0.43   1.14
  3       0.77   0.87   1.64
...
100     0.50   0.74   1.24

Now suppose I take a subsample from this, for instance,
I select rows 2, 7, 16, 33, 39, 54, 66, 71, 90, 99.
In this subsample, is y still caused by x1 and x2 or not?





.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================
Re: Regression CIs (was: Normally Distributed ANOVA FACTORS?)

Reply via email to