Eric Lund <[EMAIL PROTECTED]> wrote in sci.stat.edu: > We really do not want to score all 1600 if we don't have to. [snip] >So, what I would like to >know is, can we get an approximation for the population mean without >having any baseline data, and if so, how many students would need to >be sampled?
[Please post right-side-up and trim quotes; see <http://web.presby.edu/~nnqadmin/nnq/nquote.html>.] This prompts a further question. Putting the statistics aside for a moment, I'll bet that if Indiana mandates you develop the test locally the state also mandates what sort of data analysis you can do and whether you're allowed to score only a sample of tests. So be sure to eliminate that possibility. That said, if you don't know the population standard deviation then you can't decide a priori how big a sample you need to compute a population mean to a desired level of confidence and with a desired margin of error. What you _can_ do is score a reasonable number of them -- say 80 (which is 1/20 of the population). Compute the mean and standard deviation of that sample, then use Student's t to compute a confidence interval. This is almost the first inferential procedure in any basic statistics textbook (or you can do it easily on a TI-83 with STAT | TESTS | 8; it's a little more work in Excel). You will end up with a statement of this form: "With __(a)__% confidence, the mean score of all 1600 tests is __(b)__ +/- __(c)___." (a) is a number you preselect (.95 or 95% is common, e.g. in political polls); (b) is the sample mean you computed; (c) is computed in the confidence interval procedure. For instance, if sample size is 80 and (a) is 95% then (c) is about .2225 times your sample standard deviation. [More formally, (c) is inverse t for one- tailed area (1-95%)/2 = 0.025 and degrees of freedom = (sample size)-1, times your sample standard deviation, divided by the square root of your sample size.] Now if your sample gives an unacceptably large margin of error, about all you can do is repeat with a larger sample. But if the sample is bigger than 5%-10% of the population, the assumptions that let you compute a confidence interval as described above begin to break down. There are techniques to deal with that problem, but I don't understand them well enough to explain them. But again, I question whether this is legally acceptable. Remember all the flap two years ago about using statistical methods on the US Census figures? I'm not saying you're doing anything illegal, just counseling you to be sure where you stand (unless of course you've already investigated this legally). -- Stan Brown, Oak Road Systems, Cortland County, New York, USA http://OakRoadSystems.com/ "My theory was a perfectly good one. The facts were misleading." -- /The Lady Vanishes/ (1938) . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
