[EMAIL PROTECTED] (Michael Eglinton) wrote in news:[EMAIL PROTECTED]:
> Hello all > > I work for an organisation that receives counts of all 'notifable' > diseases from around NZ. > > I would like to compare this years figures with last years and in the > past we have used the Mantel-Haenszel chi-square test to test if there > has been a linear relationship between years (thus indicating a > change). However we have a problem with this test when cell counts > are small. > > But my question is > > Should I even be using this test at all - to my mind we are using > population data so any differences are real and therefore significant. > We do have non-sampling error in our estimates (e.g. people who do > not go to a doctor, some diseases are not always recorded because of > the number of cases received e.g. campy in Auckland) but I do not > believe that we have a good handle on the size of this error and it > may be similar from year to year. I think the real question your analysis has to answer is not "are there any differences?" (as you point out, the answer is plainly yes) but rather "do the differences represent a change in the parameters of the stochastic process that's generating the counts?" In other words, given what you know about the variability of the disease-propagation process, is the year-to- year fluctuation within the expected range of variability, or outside it? This really looks more like a statistical process control problem than a hypothesis-testing one. I'd be thinking in terms of using historical data for each disease to establish tolerance intervals for incidence rates (I'd use rates rather than counts to remove effects of population growth). It *might* be possible to use a Poisson model, though I'd definitely check your historical data for overdispersion, since incidence of many contagious diseases won't be strictly Poisson (e.g. the probability of the next person being diagnosed with influenza in an interval anchored shortly after the last diagnosis is going to be higher than that in an interval anchored long after it). In short, what you're really doing is testing how well a model (in this case one that says that incidence is a constant plus "random noise" of known mean and variance) fits your observed data. And you don't have to be sampling to do that. Standard methods of time-series analysis are probably also applicable here. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
