Comparing histograms from continuous distributions

Jay Warner Fri, 22 Nov 2002 22:11:37 -0800

OK, folks, I've got a serious statistical question here, which goes
past my knowledge limit (I didn't say how difficult that is, just that
it is :)


Suppose I wish to compare some measurements against a theoretical
distribution.  I collect data, build a histogram, and compute the
frequency expected within each 'class interval' of the histogram.
Then I do a Chi square test, comparing the expected with observed
frequencies, by count.  So long as n(expected each class) is => 5,
life is good.  (Or was it n(obs) >= 5?)  There are specialized tests
that are more sensitive than a straight Chi Square, which I can dig
out if I need them.

So how do I do the same thing, when the measurement in each class is
not a count?  I have a mass for each class, an amount on a continuous
scale, which I can also report as a percentage of the total.  I can
calculate an average and standard deviation for the entire histogram.
I can calculate an expected frequency for a theoretical distribution
with the same average and standard deviation.  I can report both
observed and expected frequencies in percent of total, stick this into
a Chi Square, and crunch the numbers.

BUT, with the original Chi square I had to require n (expected) => 5
because the value was integer, and discrete numbers being what they
are, it is not a safe thing to assume integers below 5 are all that
close to what is really theoretically expected.

Now I am calculating both expected and observed values as continuous
values with rather good precision.  The sum of the reported
percentages is within 0.07% of 100, possibly due to rounding error.
One of the expected values is 0.002%, and associated observed value is
even less.  (Let's not discuss measurement and rounding errors at this
point, OK!?)

Then on top of that, I first set up the chi square calculation in
units of percent.  I could multiply all the percent values by 100, or
1000, and repeat the whole calculation.  The percent values after all
are simply a fraction of mass times 100, reported to some precision.
When I multiply the percentages by 1, I "cannot detect a difference"
between the observed and expected distribution.  When I multiply by
1000 (fraction times 100,000), I crank up the calculated Chi Square
and yes, there is a difference.  Right.

Intuitively, reporting the amount of mass observed in each class
doesn't seem to correct my problem.  I could change the units and get
different totals, the same way as multiplying percent by an arbitrary
constant.

So where am I going out of line?  Trying to use a Chi square test of
observed counts/frequencies for comparing a (nearly) continuous
distribution?

Would I be better off working out a mean sum of squares deviation kind
of thing for each histogram interval?

How would you recommend that I make the comparison between an observed
and theoretical distribution, with continuous observations?  and where
might I get a text that gives some examples, so I can puzzle it out?

BTW, why do I care, you might ask.  I need to establish a density
function within a measurable range, so I can assert loudly the maximum
amount of mass well away from the center of the distribution.  One
option is to find a solution to the density envelope function and
extrapolate that.  However, my loud assertions will not have to be so
loud if I can relate it back to a distribution expected on physical
grounds.

Any and all guidance and direction will be appreciated.

Jay
--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
4444 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Comparing histograms from continuous distributions

Reply via email to