Re: Applied analysis question

Rich Ulrich Sun, 03 Mar 2002 15:41:56 -0800


On 28 Feb 2002 07:37:16 -0800, [EMAIL PROTECTED] (Brad Anderson)
wrote:
> Rich Ulrich <[EMAIL PROTECTED]> wrote in message 
>news:<[EMAIL PROTECTED]>...
> > On 27 Feb 2002 11:59:53 -0800, [EMAIL PROTECTED] (Brad Anderson)
> > wrote:
BA > > > 
> > > I have a continuous response variable that ranges from 0 to 750.  I
> > > only have 90 observations and 26 are at the lower limit of 0, which is
> > > the modal category.  The mean is about 60 and the median is 3; the
> > > distribution is highly skewed, extremely kurtotic, etc.  Obviously,
> > > none of the power transformations are especially useful.  The product
> > 
[ snip, my own earlier comments ]
BA >
> I should have been more precise.  It's technically a count variable
> representing the number of times respondents report using dirty
> needles/syringes after someone else had used them during the past 90
> days.  Subjects were first asked to report the number of days they had
> injected drugs, then the average number of times they injected on
> injection days, and finally, on how many of those total times they had
> used dirty needles/syringes.  All of the subjects are injection drug
> users, but not all use dirty needles.  The reliability of reports near
> 0 is likely much better than the reliability of estimates near 750. 
> Indeed, substantively, the difference between a 0 and 1 is much more
> significant than the difference between a 749 and a 750--0 represents
> no risk, 1 represents at least some risk, and high values--regardless
> of the precision, represent high risk.


Okay, here is a break for some comment by me.

There are two immediate aims of analyses:  to show that
results are extreme enough that they don't happen by 
chance - statistical testing;  and to characterize the results 
so that people can understand them - estimation.

When the mean is 60 and the median is 3, giving report 
on averages, as if they were reports on central tendencies,
 is not going to help much with either aim.  If you 
want to look at outcomes, you make groups (as you did)
that seem somewhat homogeneous.  0 (if it is). 1.  2-3....
eventually, your top group of 90+, which comes out to
'daily',  seems reasonable as a top-end.  Using groups 
ought to give you a robust test, whatever you are testing,
unless those distinctions between 10 and 500 needle-sticks
become important.  Using groups also lets you inspect, 
in particular, the means for 0, 1, 2 and 3.

I started thinking that the dimension is something like 
'promiscuous use of dirty needles';  and I realized that
an analogy to risky sex was not far wrong.  Or, at any rate,
doesn't seem far wrong to me.  But your  measure 
(the one that you mention, anyway) does not distinguish
between 1 act each with 100 risky partners, and 100 acts 
with one. 

Anyway, one way to describe the groups is to have some
experts place the reports of behaviors into 'risk-groups'.
Or assign the risk scores.   Assuming that those scores do
describe your sample, without great non-normality, you 
should be able to use averages of risk-scores for a technical
level of testing and reporting, and convert them back to the
verbal anchor-descriptions in order to explain what they mean.


[ ...Q about zero; kurtosis.]
RU > >
> > Categorizing the values into a few categories labeled, 
> > "none, almost none, ...."  is one way to convert your scores.  
> > If those labels do make sense.
> 
> Makes sense at the low end 0 risk.  And at the high end I used 90+
> representing using a dirty needle/syringe once a day or more often. 
> The 2 middle categories were pretty arbitrary.

[ snip, other procedures ]

> One of the other posters asked about the appropriate error term--I
> guess that lies at the heart of my inquiry.  I have no idea what the
> appropriate error term would be, and to best model such data.  I often
> deal with similar response variables that have distributions in which
> observations are clustered at 1 or both ends of the continuum.  In
> most cases, these distributions are not even approximately unimodal
> and a bit skewed--variables for which normalizing power
> transformations make sense.  Additionally, these typically aren't
> outcomes that could be thought of as being generated by a gaussian
> process.

Can you describe them usefully?  What is the shape of
the behaviors that you observe or expect, corresponding to
the drop-off of density near either extreme?

> In some cases I think it makes sense to consider poisson and
> generalizations of poisson processes although there is clearly much
> greater between subject heterogeneity than assumed by a poisson
> process.  I estimated poission and negative binomial regression
> models--there was compelling evidence that the poission was
> overdispersed.  I also used a Vuong statistic to compare NB regression

[.... snip, more detail ]

> I think a lot of folks just run standard analyses or arbitrarily apply
> some "normalizing" transformation because that's whats done in their
> field.  Then report the results without really examining the
> underlying distributions.  I'm curious how folks procede when they
> encounter very goofy distrubions.  Thanks for your comments.

Most folks don't know of anything more than ranks, or they 
would be afraid to try to justify it.  IF there is not something
'done in their field'.  Actually, the precedent of the field
ought to be one of the first things tried.

Agresti has the example with groups formed on reported 
alcohol consumption.  In his "Introduction to categorical analyses",
he shows alternative analyses where the categories are scored 
with the mid-range of consumption; ranks;  and 0-5.  Because of 
the group sizes, using ranks proved to be the weakest analysis.
(I don't know how a logit analysis fares.)
 - Now, I have learned to be a bit allergic to using zero as 'none'
and an equal step from 1  in the expected direction.  If I remember
right, I found a stronger test on his example   when I omitted 
that group.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Re: Applied analysis question

Reply via email to