# RE: Observation selection effects

```
>-----Original Message-----
>From: Jesse Mazer [mailto:[EMAIL PROTECTED]
>Sent: Tuesday, October 05, 2004 11:01 PM
>To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>Subject: RE: Observation selection effects
>
>
>>>-----Original Message-----
>>>From: Jesse Mazer [mailto:[EMAIL PROTECTED]
>>>Sent: Tuesday, October 05, 2004 8:45 PM
>>>To: [EMAIL PROTECTED]; [EMAIL PROTECTED]
>>>Subject: RE: Observation selection effects
>>>
>>>If the range of the smaller amount is infinite,
>>>as in my P(x)=1/e^x
>>>example, then it would no longer make sense to say that
>>>the range of the
>>>larger amount is r times larger.
>>
>>Sure it does; r*inf=inf.  P(s)=exp(-x) -> P(l)=exp(-x/r)
>
>But it would make just as much sense to say that the
>second range is 3r
>times wider, since by the same logic 3r*inf=inf. In
>other words, this step
>in your proof doesn't make sense:
>
>>In other words, the range of possible
>>amounts is such that the larger and smaller amount do
>not overlap.
>>Then, for any interval of the range (x,x+dx) for the smaller
>>amount with probability p, there is a corresponding
>interval (r*x,
>>r*x+r*dx) with probability p for the larger amount.  Since the
>>latter interval is longer by a factor of r
>>
>>         P(l|m)/P(s|m) = r ,
>>
>>In other words, no matter what m is, it is r-times more
>likely to
>>fall in a large-amount interval than in a small-amount interval.
>
>As for your statement that "P(s)=exp(-x) ->
>P(l)=exp(-x/r)", that can't be
>true. It doesn't make sense that the value of the second
>probability
>distribution at x would be exp(-x/r), since the range of
>possible values for
>the amount in that envelope is 0 to infinity, but the
>integral of exp(-x/r)
>from 0 to infinity is not equal to 1, so that's not a
>valid probability
>distribution.
>
>Also, now that I think more about it I'm not even sure
>the step in your
>proof I quoted above actually makes sense even in the
>case of a probability
>distribution with finite range. What exactly does the equation
>"P(l|m)/P(s|m) = r" mean, anyway?```
```
For any give amount of money, m, found in the first envelope, it
is more probable by a factor of r that it came from the Larger
envelope - where "probable" means degree of rational belief, not
fraction in a statistical ensemble.

>It can't mean that if
>I choose an envelope
>at random, before I even open it I can say that the
>amount m inside is r
>times more likely to have been picked from the larger
>distribution, since I
>know there is a 50% chance I will pick the envelope
>whose amount was picked
>from the larger distribution. Is it supposed to mean
>that if we let the
>number of trials go to infinity and then look at the
>subset of trials where
>the envelope I opened contained m dollars, it is r times
>more likely that
>the envelope was picked from the larger distribution on
>any given trial?
>This can't be true for every specific m--for example, if
>the smaller
>distribution had a range of 0 to 100 and the larger had
>a range of 0 to 200,

But whole point is that there is no "specific m" from which you
can reason.

>if I set m=150, then in every single trial where I found
>150 dollars in the
>envelope it must have been selected from the larger
>distribution. You could
>do a weighted average over all possible values of m,
>like "integral over all
>possible values of m of P('I found m dollars in the envelope I
>selected')*P('the envelope I selected had an amount
>taken from the smaller
>distribution' | 'I found m dollars in the envelope I
>selected'), which you
>could write as "integral over m of P(m)*P(s|m)", but I
>don't think it would
>be true that the ratio "integral over m of
>P(m)*P(l|m)"/"integral over m of
>P(m)*P(s|m)" would be equal to r, in fact I think both
>integrals would
>always come out to 1/2 so the ratio would always be
>1...and even if I'm
>wrong, replacing P(l|m)/P(s|m) with this ratio of
>integrals would mess up
>
>Jesse

No, it doesn't depend on assuming a flat distribution for the
money, only for our knowledge (or on our acceptance of problem as
stated).  Here's the more explicit (but less intuitive) proof - I
hope the formatting doesn't get chopped up too much by your mail

Without loss of generality, we can describe our prior density
functions for the amounts in the two envelopes in terms of a
density function, fo(x), the ratio r of the larger amount to the
smaller, and a scale factor, k.  Let L be the event that the
evelope with the larger amount is picked and S the event that the
envelope with the smaller amount if picked.  Then our prior
density functions for the amount m in the envelope is:

For the smaller amount our prior is:    f(m|S k) = k fo(km)
and for the larger amount:              f(m|L k) = (k/r) fo(km/r)

Our uncertainity about the scale factor, k, is described by a
density g(k).  So

f(m|S) = INT k fo(km) g(k) dk

,where INT is integral zero-to-infinity

f(m|L) = INT (k/r) fo(km/r) g(k) dk

Now in the first equation make a change of variable in the
integral by y=km

f(m|S) = INT (y/m) fo(y) g(y/m) dy/m = (1/m^2) INT y fo(y) g(y/m)
dy

and in the second change the variable of integration by x=km/r

f(m|L) = INT (x/m) fo(x) g(rx/m) (r/m) dx = (r/m^2) INT x fo(x)
g(rx/m) dx

Now if we assume no prior knowledge of the scale of the amounts,
we will take g(k) to be a flat (improper) density and the last two
integrals will be equal; whence

f(m|L)/f(m|S) = r

But, by Bayes

f(m|L) = P(L|m) f(m)/P(L)

so
P(L|m) = f(m|L) P(L)/ f(m) = f(m|L) P(L)/[f(m|L) P(L) + f(m|S)
P(S)]

using P(L) = P(S), i.e. equal prior probability of selecting the
larger or the smaller

P(L|m) = f(m|L)/[f(m|L) + f(m|S)]

Then dividing numerator and denominator by f(m|S)

P(L|m) = r/[r + 1]

and     P(S|m) = 1 - P(L|m) = 1/[r + 1]

So the expected value of switching is

<switch> = P(L|m)m/r + P(S|m)rm
= [r/(r+1)]m/r + [1/(r+1)]rm
=m

which is the same as not switching and keeping the amount m found
in the first envelope; so there is no paradox.  Note that if (as
would be the case in a real instance) we do suppose we know
something about the scale of the amounts, i.e. our prior for g(k)
is not actually flat, then we will expect a gain from switching if
we see an amount m that is toward the low end of our prior and we
will not expect a gain if the amount we see is high.  We do not
have the paradox of wanting to switch even before we see the
amount in the first envelope selected.

Brent

```