On Fri, Mar 6, 2020 at 8:08 AM 'Brent Meeker' via Everything List <
[email protected]> wrote:

> On 3/5/2020 2:45 AM, Bruce Kellett wrote:
>
>
> Now sequences with small departures from equal numbers will still give
> probabilities within the confidence interval of p = 0.5. But this
> confidence interval also shrinks as 1/sqrt(N) as N increases, so these
> additional sequences do not contribute a growing number of cases giving p ~
> 0.5 as N increases. So, again within factors of order unity, the proportion
> of sequences consistent with p = 0.5 decreases without limit as N
> increases. So it is not the case that a very large proportion of the binary
> strings will report p = 0.5. The proportion lying outside the confidence
> interval of p = 0.5 is not vanishingly small -- it grows with N.
>
>
> I agree with you argument about unequal probabilities, in which all the
> binomial sequences occur anyway leading to inference of p=0.5.  But in the
> above paragraph you are wrong about the how the probability density
> function of the observed value changes as N->oo.  For any given interval
> around the true value, p=0.5, the fraction of observed values within that
> interval increases as N->oo.  For example in N=100 trials, the proportion
> of observers who calculate an estimate of p in the interval (0.45 0.55) is
> 0.68.  For N=500 it's 0.975.  For N=1000 it's 0.998.
>
> Confidence intervals are constructed to include the true value with some
> fixed probability.  But that interval becomes narrower as 1/sqrt(N).
> So the proportion lying inside and outside the interval is relatively
> constant, but the interval gets narrower.
>


I think I am beginning to see why we are disagreeing on this. You are using
the normal approximation to the binomial distribution for a large sequence
of trials with some fixed probability of success on each trial. In other
words, it is as though you consider the 2^N binary strings of length N to
have been generated by some random process, such as coin tosses or the
like, with some prior fixed probability value. Each string is then
constructed as though the random process takes place in a single word, so
that there is only one outcome for each toss.

Given such an ensemble, the statistics you cite are undoubtedly correct: as
the length of the string increases, the proportion of each string within
some interval of the given probability increases -- that is what the normal
approximation to the binomial gives you. And as N increases, the confidence
interval shrinks, so the proportion within a confidence interval is
approximately constant. But note these are the proportions within each
string as generated with some fixed probability value. If you take an
ensemble of such strings, the the result is even more apparent, and the
proportion of strings in which the probability deviates significantly from
the prior fixed value decreases without limit.

That is all very fine. The problem is that this is not the ensemble of
strings that I am considering!

The set of all possible bit strings of length N is not generated by some
random process with some fixed probability. The set is generated entirely
deterministically, with no mention whatsoever of any probability. Just
think about where these strings come from. You measure the spin of a
spin-half particle. The result is 0 in one branch and 1 in the other. Then
the process is repeated, independently in each branch, so the 1-branch
splits into a 11-branch and a 10-branch; and the 0-branch splits into a
01-branch and a 00-branch. This process goes on for N repetitions,
generating all possible bit strings of length N in an entirely
deterministic fashion. The process is illustrated by Sean Carroll on page
134 of his book.

Given the nature of the ensemble of bit strings that I am considering, the
statistical results I quote are correct, and your statistics are completely
inappropriate. This may be why we have been talking at cross purposes. I
suspect that Russell has a similar misconception about the nature of the
bit strings under consideration, since he talked about statistical results
that could only have been obtained from an ensemble of randomly generated
strings.

Bruce

-- 
You received this message because you are subscribed to the Google Groups 
"Everything List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/everything-list/CAFxXSLQeHcBgfa_SPc2AE02VwFFhKzmFbbmhW92%3DZhXOftUKBw%40mail.gmail.com.

Reply via email to