the "two question" question was beaten to death, but I can't help sticking myself into this one. It's too old to ignore :)
when we compare two groups, i.e., one indep. categorical variable of two levels, near continuous & interval or ratio scale response, we have a z or t test on our hands. the 'party line' is that if sigma is 'known' you can use the z, if not use t. And often, to help 'simplify' things, the book will say that when the sample size is over 30, you can use z anyway - the estimate of sigma is 'close enough.' Please don't side track now on the population size - it is most often effectively infinite. What is meant by 'known sigma'? It means that you have measured a heck of a lot of items (30 or more, if you believe the book) to estimate sigma of the population, or you have been handed sigma from someone else who has made the measurements. In a manufacturing environment, that someone is typically the QA dept., if they follow this logic. In typical business-related analyses the origins of 'known' sigmas may be less clear. How good (accurate, precise, unbiased) is this estimate of sigma? When it comes from someone else, we don't know, a nd often can't know. So we call it 'known' and proceed. But the number we have is still an estimate of the population sigma, which in principle we can never measure exactly (most cases). And often, we have no way of assessing how well that number applies to the current situation. Did the original sigma estimate come from the same defined population as the current data? Did these samples use a different paint? Did this survey contact people from the same part (statistically identical group of people) of town? May I try a different dichotomy? When I am given the estimated sigma from somewhere else, so that I can't assess the validity of it, I call this an _external_ estimate of sigma. I accept it with the usual reservations, and proceed to do the problem. The equation will be a z test type. When I am not handed sigma on a platter, but must estimate it from the sample data, I call that an _internal_ estimate of sigma. The estimate definitely applies to the population at hand, to be considered & evaluated as ever. the equation will be a t test. Notice: No n >= 30. As some mentioned recently, with machines we have p-values for t tests, that were not feasible in the dark past without the machines. therefore, there is no need to decide when n is 'close enough' to use a z test. Using this method to split between the two test types, there is less confusion, and we can retain a healthy skepticism about the validity of that external estimate of sigma. What are the holes in this approach? Cheers, Jay "Robert J. MacG. Dawson" wrote: > Paul Bernhardt wrote: > > > Now, if we can only get over the arbitrariness of the n<30 cut-off for > > use of t vs z and teach: use z when you know sigma and t when you don't. > > (Triola, as much as I like some of its choices, still retains this) <sigh> > > How about (at least for social & health sciences): use t when you don't > know sigma, and speak to a statistician when you think you do? > > -Robert Dawson > .. > .. > ================================================================= > Instructions for joining and leaving this list, remarks about the > problem of INAPPROPRIATE MESSAGES, and archives are available at: > .. http://jse.stat.ncsu.edu/ . > ================================================================= -- Jay Warner Principal Scientist Warner Consulting, Inc. 4444 North Green Bay Road Racine, WI 53404-1216 USA Ph: (262) 634-9100 FAX: (262) 681-1133 email: [EMAIL PROTECTED] web: http://www.a2q.com The A2Q Method (tm) -- What do you want to improve today? . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
