OK, folks, I've got a serious statistical question here, which goes past my knowledge limit (I didn't say how difficult that is, just that it is :)
Suppose I wish to compare some measurements against a theoretical distribution. I collect data, build a histogram, and compute the frequency expected within each 'class interval' of the histogram. Then I do a Chi square test, comparing the expected with observed frequencies, by count. So long as n(expected each class) is => 5, life is good. (Or was it n(obs) >= 5?) There are specialized tests that are more sensitive than a straight Chi Square, which I can dig out if I need them. So how do I do the same thing, when the measurement in each class is not a count? I have a mass for each class, an amount on a continuous scale, which I can also report as a percentage of the total. I can calculate an average and standard deviation for the entire histogram. I can calculate an expected frequency for a theoretical distribution with the same average and standard deviation. I can report both observed and expected frequencies in percent of total, stick this into a Chi Square, and crunch the numbers. BUT, with the original Chi square I had to require n (expected) => 5 because the value was integer, and discrete numbers being what they are, it is not a safe thing to assume integers below 5 are all that close to what is really theoretically expected. Now I am calculating both expected and observed values as continuous values with rather good precision. The sum of the reported percentages is within 0.07% of 100, possibly due to rounding error. One of the expected values is 0.002%, and associated observed value is even less. (Let's not discuss measurement and rounding errors at this point, OK!?) Then on top of that, I first set up the chi square calculation in units of percent. I could multiply all the percent values by 100, or 1000, and repeat the whole calculation. The percent values after all are simply a fraction of mass times 100, reported to some precision. When I multiply the percentages by 1, I "cannot detect a difference" between the observed and expected distribution. When I multiply by 1000 (fraction times 100,000), I crank up the calculated Chi Square and yes, there is a difference. Right. Intuitively, reporting the amount of mass observed in each class doesn't seem to correct my problem. I could change the units and get different totals, the same way as multiplying percent by an arbitrary constant. So where am I going out of line? Trying to use a Chi square test of observed counts/frequencies for comparing a (nearly) continuous distribution? Would I be better off working out a mean sum of squares deviation kind of thing for each histogram interval? How would you recommend that I make the comparison between an observed and theoretical distribution, with continuous observations? and where might I get a text that gives some examples, so I can puzzle it out? BTW, why do I care, you might ask. I need to establish a density function within a measurable range, so I can assert loudly the maximum amount of mass well away from the center of the distribution. One option is to find a solution to the density envelope function and extrapolate that. However, my loud assertions will not have to be so loud if I can relate it back to a distribution expected on physical grounds. Any and all guidance and direction will be appreciated. Jay -- Jay Warner Principal Scientist Warner Consulting, Inc. 4444 North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
