Re: Should I be using statistics here?

Eric Bohlman Mon, 23 Feb 2004 05:55:40 -0800

[EMAIL PROTECTED] (Michael Eglinton) wrote in 
news:[EMAIL PROTECTED]:


> Hello all
> 
> I work for an organisation that receives counts of all 'notifable'
> diseases from around NZ.
> 
> I would like to compare this years figures with last years and in the
> past we have used the Mantel-Haenszel chi-square test to test if there
> has been a linear relationship between years (thus indicating a
> change).  However we have a problem with this test when cell counts
> are small.
> 
> But my question is 
> 
> Should I even be using this test at all - to my mind we are using
> population data so any differences are real and therefore significant.
>  We do have non-sampling error in our estimates (e.g. people who do
> not go to a doctor, some diseases are not always recorded because of
> the number of cases received e.g. campy in Auckland) but I do not
> believe that we have a good handle on the size of this error and it
> may be similar from year to year.

I think the real question your analysis has to answer is not "are there any 
differences?" (as you point out, the answer is plainly yes) but rather "do 
the differences represent a change in the parameters of the stochastic 
process that's generating the counts?"  In other words, given what you know 
about the variability of the disease-propagation process, is the year-to-
year fluctuation within the expected range of variability, or outside it?  
This really looks more like a statistical process control problem than a 
hypothesis-testing one.

I'd be thinking in terms of using historical data for each disease to 
establish tolerance intervals for incidence rates (I'd use rates rather 
than counts to remove effects of population growth).  It *might* be 
possible to use a Poisson model, though I'd definitely check your 
historical data for overdispersion, since incidence of many contagious 
diseases won't be strictly Poisson (e.g. the probability of the next person 
being diagnosed with influenza in an interval anchored shortly after the 
last diagnosis is going to be higher than that in an interval anchored long 
after it).

In short, what you're really doing is testing how well a model (in this 
case one that says that incidence is a constant plus "random noise" of 
known mean and variance) fits your observed data.  And you don't have to be 
sampling to do that.

Standard methods of time-series analysis are probably also applicable here.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Should I be using statistics here?

Reply via email to