Re: Regression CIs (was: Normally Distributed ANOVA FACTORS?)

bijag Sun, 29 Sep 2002 10:45:23 -0700

This is a multi-part message in MIME format.

------=_NextPart_000_0044_01C267BB.33A51F80
Content-Type: text/plain;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Gus,

You seem to be playing the game "heads Gus wins, tails Bill loses."   =
There is nothing dishonest about what I am saying.   I describe a =
procedure that is very easy to do with any spread sheet. The logic is =
there to support the results. When you say "admit" perhaps you should =
say that you see. No one is guilty here and no one  is winning. We are =
scientists seeking truth, together. So you SEE that we can differetiate =
the cause from the effect, assuming that effects are combinations of =
causes... at least in simulations and analyses as I describe them and as =
you have replicated them. What you are seeing is what many people before =
you have seen and gone on to dismiss by reference to what they believe =
is my crazy personality or some other nonsensical excuse for backing out =
when the results support CR. I trust that you will not do that but I do =
not know if any demonstration or logic is capable of convencing you of =
the validity of CR.  Can you be convinced or are you simply trying to =
disprove me with out any concern for the truth?  I do not mean this =
offensively, but I do want to have some sense of where this conversation =
is headed.=20

I am afraid that I still do not know what you did.  Remember, I am only =
a psychologist, not a mathematician. Would you mind giving us a small =
data set and step by step instructions, for example, as we might do with =
excel? I am particularly concerned about the criteria you use to pick =
sets of values.  I do not see the point in your sampling scheme since =
all we need to do is sort the variables by the normally distributed ones =
and trim off all the data corresponding to the upper and lower tails of =
the normal distributions.   This seems to be the most parsimonious thing =
to do.  Furthermore, when the normally distributed variable is the =
hypthesized effect, we need not ever trim the tails off. The effect is =
supposed to be triangular.=20

It sounds to me that when you collect subsamples you are selecting y =
values somehow so that you are building in additional dependencies =
between the collected x values and the y values. This is why I need to =
know exactly how you select the values to retain.

I am sure that had you failed to replicate the trimming experiment as I =
described it you would have claimed the failure to be proof - that CR =
does not measure causation.  Heads Gus wins, tails Bill loses.  Lets be =
explicit. I will win no matter what happens to CR because I will better =
understand the truth. Everything both of us are doing is loaded.  We are =
putting many assumptions and discoveries on the line by doing tests. It =
is the sort of thing real men do. But let us be explicit.

So define your terms, philosopher, and let the axe fall where is does.=20

Best,

Bill
  "Gus Gassmann" <[EMAIL PROTECTED]> wrote in message =
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
   =20
  [EMAIL PROTECTED] wrote:=20

    Gus,=20
    I am still not sure what you are doing.  What is a bucket? The =
essence of=20
    what you seem to be claiming is that when we sample y to be uniform, =
then CR=20
    gives us the opposite results.  You admit, however, that CR works =
with the=20
    usual approach.

  "Works" is a loaded word. I admit that when x1 and x2 are sampled to =
be uniform=20
  and you compute y =3D x1 + x2 and further compute the correlation of y =
with x1=20
  both overall and over subsets of the x1 (lower, middle and upper), the =
correlation=20
  coefficients follow a predicted pattern.=20
  Now here is another stab at explaining what I did. First off, when you =
generate=20
  x1 uniformly on [a1,b1] and x2 uniformly on [a2,b2], then you can set =
up=20
  a grid on the rectangle [a1,b1] x [a2,b2], like this: (Hope this comes =
out;=20
  it is intended to be viewed with a monospaced font.)=20

  (a1,b2)                                       (b1,b2)=20
       ---- ---- ----       ----=20
      |    |    |    |     |    |=20
      |    |    |    | ... |    |=20
       ---- ---- ----       ----=20
      |    |    |    |     |    |=20
      |    |    |    |     |    |=20
       ---- ---- ----       ----=20
   =20
        ...=20
   =20
       ---- ---- ----       ----=20
      |    |    |    |     |    |=20
      |    |    |    | ... |    |=20
       ---- ---- ----       ----=20
  (a1,a2)                                       (b1,a2)=20

  If the distribution is uniform, you would expect each of the smaller=20
  rectangles to contain roughly the same number of points (provided=20
  the sample is large enough). My idea was to force this relationship=20
  by _picking_ one set of values in each tiny rectangle. If you have=20
  10000 tiny rectangles (100 x 100), that gives you a pretty good=20
  approximation to a uniform distribution (with 10000 data points).=20
  (Do you agree?)=20

  So then I set out to find a subsample of the big one that was uniform=20
  in x2 and y (where y was computed earlier as y =3D x1 + x2). Here's =
the=20
  important question: Do you agree that the causality in the subsample=20
  should be the same as the causality in the overall sample?=20

  To my surprise and amazement, I did not find the pattern I predicted.=20
  In fact, what this smaller sample suggests is that x2 and y cause x1.=20
  In other words, different selection of the data points results in a=20
  different causal relationship. This should not happen.=20


------=_NextPart_000_0044_01C267BB.33A51F80
Content-Type: text/html;
        charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 5.50.4522.1800" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>Gus,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>You seem to be playing the game "heads =
Gus wins,=20
tails Bill loses."&nbsp;&nbsp; There is nothing dishonest&nbsp;about =
what I am=20
saying.&nbsp;&nbsp; I describe a procedure that is very easy to do with =
any=20
spread sheet. The logic is there to support the results. When you say =
"admit"=20
perhaps you should say that you see. No one is guilty here&nbsp;and no =
one=20
&nbsp;is winning. We are scientists seeking truth, together. So you SEE=20
that&nbsp;we can differetiate the cause from the effect, assuming that =
effects=20
are combinations of causes... at least in simulations and analyses as I =
describe=20
them and as you have replicated them. What you are seeing is what many =
people=20
before you have seen and gone on to dismiss by reference to what they =
believe is=20
my crazy personality or some other nonsensical excuse for backing out =
when the=20
results support CR. I trust that you will not do that but I do not know =
if any=20
demonstration or logic is capable of convencing you of the validity of =
CR.&nbsp;=20
Can you be convinced or are you simply trying to disprove me with out =
any=20
concern for the truth?&nbsp; I do not mean this offensively, but I do =
want to=20
have some sense of where this conversation is headed. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>I am afraid that I still do not know =
what you=20
did.&nbsp; Remember, I am only a psychologist, not a mathematician. =
Would you=20
mind giving us a small data set and step by step instructions, for =
example, as=20
we might do with excel? I am particularly concerned about the criteria =
you use=20
to pick sets of values.&nbsp; I do not see the point in your sampling =
scheme=20
since all we need to do is sort the variables by the normally =
distributed ones=20
and trim off all the data corresponding to the upper and lower tails of =
the=20
normal distributions.&nbsp;&nbsp; This seems to be the most parsimonious =
thing=20
to do.&nbsp; Furthermore, when the normally distributed variable is the=20
hypthesized effect, we need not ever trim the tails off. The effect is =
supposed=20
to be triangular. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT><FONT face=3DArial =
size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>It sounds to me that when you collect =
subsamples=20
you are selecting y values somehow so that you are building in =
additional=20
dependencies between the collected x values and the y values. This is =
why I need=20
to know exactly how you select the values to retain.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>I am sure that had you failed to =
replicate the=20
trimming experiment as I described it you would have claimed the failure =
to be=20
proof - that CR does not measure causation.&nbsp; Heads Gus wins, tails =
Bill=20
loses.&nbsp; Lets be explicit. I will win no matter what happens to CR =
because I=20
will better understand the truth. Everything both of us are doing is=20
loaded.&nbsp; We are putting many assumptions and discoveries on the =
line by=20
doing tests. It is the sort of thing real men do. But let us be=20
explicit.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>So define your terms, philosopher, and =
let the axe=20
fall where is does. </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Best,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Bill</FONT></DIV>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-RIGHT: 0px; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; =
BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
  <DIV>"Gus Gassmann" &lt;<A=20
  href=3D"mailto:[EMAIL PROTECTED]";>[EMAIL PROTECTED]</A>&gt; =
wrote in=20
  message <A=20
  =
href=3D"[EMAIL PROTECTED]"">news:[EMAIL PROTECTED]";>news:[EMAIL PROTECTED]=
al.ca</A>...</DIV>&nbsp;=20

  <P><A href=3D"mailto:[EMAIL PROTECTED]";>[EMAIL PROTECTED]</A> =
wrote:=20
  <BLOCKQUOTE TYPE=3D"CITE">Gus,=20
    <P>I am still not sure what you are doing.&nbsp; What is a bucket? =
The=20
    essence of <BR>what you seem to be claiming is that when we sample y =
to be=20
    uniform, then CR <BR>gives us the opposite results.&nbsp; You admit, =

    however, that CR works with the <BR>usual =
approach.</P></BLOCKQUOTE>"Works" is=20
  a loaded word. I admit that when x1 and x2 are sampled to be uniform =
<BR>and=20
  you compute y =3D x1 + x2 and further compute the correlation of y =
with x1=20
  <BR>both overall and over subsets of the x1 (lower, middle and upper), =
the=20
  correlation <BR>coefficients follow a predicted pattern.=20
  <P>Now here is another stab at explaining what I did. First off, when =
you=20
  generate <BR>x1 uniformly on [a1,b1] and x2 uniformly on [a2,b2], then =
you can=20
  set up <BR>a grid on the rectangle [a1,b1] x [a2,b2], like this: (Hope =
this=20
  comes out; <BR>it is intended to be viewed with a monospaced font.)=20
  =
<P>(a1,b2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;=20
  (b1,b2) <BR><TT>&nbsp;&nbsp;&nbsp;&nbsp; ---- ----=20
  ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----</TT> =
<BR><TT>&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |</TT>=20
  <BR><TT>&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp; | ... |&nbsp;&nbsp;&nbsp; |</TT>=20
  <BR><TT>&nbsp;&nbsp;&nbsp;&nbsp; ---- ----=20
  ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----</TT> =
<BR><TT>&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |</TT>=20
  <BR><TT>&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; =
|</TT>=20
  <BR><TT>&nbsp;&nbsp;&nbsp;&nbsp; ---- ----=20
  ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----</TT> <BR><TT>&nbsp;</TT> =

  <BR><TT>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ...</TT> <BR>&nbsp;=20
  <BR><TT>&nbsp;&nbsp;&nbsp;&nbsp; ---- ----=20
  ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----</TT> =
<BR><TT>&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |</TT>=20
  <BR><TT>&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;&nbsp;=20
  |&nbsp;&nbsp;&nbsp; | ... |&nbsp;&nbsp;&nbsp; |</TT>=20
  <BR><TT>&nbsp;&nbsp;&nbsp;&nbsp; ---- ----=20
  ----&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----</TT>=20
  =
<BR>(a1,a2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;=20
  (b1,a2)=20
  <P>If the distribution is uniform, you would expect each of the =
smaller=20
  <BR>rectangles to contain roughly the same number of points (provided =
<BR>the=20
  sample is large enough). My idea was to force this relationship <BR>by =

  _picking_ one set of values in each tiny rectangle. If you have =
<BR>10000 tiny=20
  rectangles (100 x 100), that gives you a pretty good <BR>approximation =
to a=20
  uniform distribution (with 10000 data points). <BR>(Do you agree?)=20
  <P>So then I set out to find a subsample of the big one that was =
uniform=20
  <BR>in x2 and y (where y was computed earlier as y =3D x1 + x2). =
Here's the=20
  <BR>important question: Do you agree that the causality in the =
subsample=20
  <BR>should be the same as the causality in the overall sample?=20
  <P>To my surprise and amazement, I did not find the pattern I =
predicted.=20
  <BR>In fact, what this smaller sample suggests is that x2 and y cause =
x1.=20
  <BR>In other words, different selection of the data points results in =
a=20
  <BR>different causal relationship. This should not happen.=20
</P></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0044_01C267BB.33A51F80--


.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Regression CIs (was: Normally Distributed ANOVA FACTORS?)

Reply via email to