Re: tricky explanation problem with chi-square on multinomial experiment(dice)

Jim Snow Fri, 25 Jan 2002 17:40:26 -0800


"Gottfried Helms" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Hi ,
>
>  there was a tricky problem, recently, with the chi-square-density
>  of higher dgf's.
>  I discussed thath in sci.stat.consult and in a german newsgroup,
>  got some answers and also think to have understood the real point.
>
>  But I would like to have a smoother explanation, as I have to
>  deal with it in my seminars. Maybe someone out has an idea or
>  a better shortcut, how to describe it.
>  To illustrate this I just copy&paste an exchange from s.s.consult;
>  hope you forgive my lazyness. On the other hand: maybe the
>  true point comes out better this way.
>
> Regards
> Gottfried
>
>
> 3 postings added:
> ---(1/3)-------------------------------
> [Gottfried]
> > Hi -
> >
> >    im stumbling in the dark... eventually only missing any
> >    simple hint.
> >    I'm trying to explain the concept of significance of the
> >    deviation of an empirical sample from a given, expected
> >    distribution.
> >    If we discuss the chi-square-distribution
> >      |
> >      |*
> >      | *
> >      | *
> >      |  *
> >      |   *
> >      |     *
> >      |          *
> >      |                        *
> >      +---------------------------------
> >
> >    then this graph illustrates us very well, that and how a
> >    small deviation is more likely to happen than a high deviation -
> >    thus backing the concept of the 95%tiles etc. in the beginners
> >    literature.
> >    Just cutting it in equal slices this curve gives us expected
> >    frequencies of occurences of samples with individual chi-squared
> >    deviations from the expected occurences.
> >
> >    If I have more df's, then the curve changes its shape; in this
> >    case a 5 df-curve for samples of thrown dices, where I count
> >    the frequencies of occurences of each number and the deviation
> >    of these frequencies from the uniformity.
> >
> >      |
> >      |
> >      |
> >      |
> >      |            *
> >      |          *    *
> >      |        *            *
> >      |     *                        *
> >      |  *                                         *
> >      +-------------------------------------------------
> >      0            X²(df=5)
> >
> >      Now the slices with the highest frequency of occurences
> >      are not the ones with the smallest deviation from the
> >      expected distribution (X²=0) - and even if I accept, that this
> >      is at least so for the cumulative distribution, it is
> >      suddenly no more "self-explaining". It is congruent with
> >      the reality, but our common language is different:
> >      the most likely chisquare-deviation from the uniformity
> >      is now an area which is not at the zero-mark.
> >      So, now: do we EXPECT a deviation from uniformity?
> >      That the count of frequencies of the occurences of the
> >      6 dices numbers is NOT most likely uniform? HÄH?
> >      Is this suddenly the Nullhypothesis?  And do we calculate
> >      the deviation of our empirical sample then from this new
> >      Nullhypothesis???
> >
> >      I never thought about that in this way, but since I do
> >      now, I feel a bit confused, maybe I only have to step
> >      aside a bit?
> >      Any good hint appreciated -
> >
> > Gottfried.
> >
> ----------------------------------------------------------------------
>
> ---(2/3)---------------------------------------
> Then one participant answered:
>
> > Actually, that corresponds to the notion that if a "random" sequence is
> > *too* uniform, it isn't really random.  For example, if you were to toss
a
> > coin 1000 times, you'd be a little surprised if you got *exactly* 500
> > heads and 500 tails.  If you think in terms of taking samples from a
> > multinomial population, the non-monotonicity of the chi-square density
> > means that a *small* amount of sampling error is more probable than *no*
> > sampling error, as well as more probable than a *large* sampling error,
> > which I think corresponds pretty well to our intuition.
> >
>
> -------------------------------------------------------------------
>
> --(3/3)-----------------------------------------
> I was not really satisfied with this and answered, after I had
> got some more insight:
>
> [Gottfried]
> >   [xxxx] wrote:
> > > Actually, that corresponds to the notion that if a "random" sequence
is
> > > *too* uniform, it isn't really random.  For example, if you were to
toss a
> > > coin 1000 times, you'd be a little surprised if you got *exactly* 500
> > > heads and 500 tails.  If you think in terms of taking samples from a
> >
> >
> > Yes, this is true. But it is the same with each other combination.
> > No one is more likely to occur (or better: one should say: variation?).
> > But then, a student would ask, how could you still attribute a
near-expected-
> > variation more likely than a far-away-expected variation in generality?
> >
> > The reason is, that we don't argue about a specific variation,
> > but about properties of a variation, or in this case, of a combination.
> > We commonly select the property of "having a distance from the
> > expected variation", measured in terms of squared deviation.
> > The mess is, that with this criterion, with multinomial configuration,
> > there are plenty of variations satisfying the same combinatorial
> > distance in terms of the squared deviation - up to a local maximum.
> > My difficulties are, to make this clear in simple words; best in
> > such simple words, as I used, when I explained the rationale of
> > chi-square and significance...
> > Ok, maybe, it's more a subject for news://sci.stat,edu , I guess.
> >
> > Thanks again for your input -
> >
> > Gottfried Helms.
>
    For the smoother explanation that you want, point out that a ChiSquared
rv on n degrees of freedom is the sum of (at least) n observations.
(Sometimes n+r observations with r linear constraints.) Your students will
be conscious of this as the way they always calculate this.
  If we add n random variables, each of which is positive, the most likely
value of the sum will obviously increase as n increases and the probability
of very small values will decrease. In addition, the Central Limit Theorem
shows that the density of the sum approaches a normal density.





=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Re: tricky explanation problem with chi-square on multinomial experiment(dice)

Reply via email to