On 28 Feb 2002 07:37:16 -0800, [EMAIL PROTECTED] (Brad Anderson) wrote: > Rich Ulrich <[EMAIL PROTECTED]> wrote in message >news:<[EMAIL PROTECTED]>... > > On 27 Feb 2002 11:59:53 -0800, [EMAIL PROTECTED] (Brad Anderson) > > wrote: BA > > > > > > I have a continuous response variable that ranges from 0 to 750. I > > > only have 90 observations and 26 are at the lower limit of 0, which is > > > the modal category. The mean is about 60 and the median is 3; the > > > distribution is highly skewed, extremely kurtotic, etc. Obviously, > > > none of the power transformations are especially useful. The product > > [ snip, my own earlier comments ] BA > > I should have been more precise. It's technically a count variable > representing the number of times respondents report using dirty > needles/syringes after someone else had used them during the past 90 > days. Subjects were first asked to report the number of days they had > injected drugs, then the average number of times they injected on > injection days, and finally, on how many of those total times they had > used dirty needles/syringes. All of the subjects are injection drug > users, but not all use dirty needles. The reliability of reports near > 0 is likely much better than the reliability of estimates near 750. > Indeed, substantively, the difference between a 0 and 1 is much more > significant than the difference between a 749 and a 750--0 represents > no risk, 1 represents at least some risk, and high values--regardless > of the precision, represent high risk.
Okay, here is a break for some comment by me. There are two immediate aims of analyses: to show that results are extreme enough that they don't happen by chance - statistical testing; and to characterize the results so that people can understand them - estimation. When the mean is 60 and the median is 3, giving report on averages, as if they were reports on central tendencies, is not going to help much with either aim. If you want to look at outcomes, you make groups (as you did) that seem somewhat homogeneous. 0 (if it is). 1. 2-3.... eventually, your top group of 90+, which comes out to 'daily', seems reasonable as a top-end. Using groups ought to give you a robust test, whatever you are testing, unless those distinctions between 10 and 500 needle-sticks become important. Using groups also lets you inspect, in particular, the means for 0, 1, 2 and 3. I started thinking that the dimension is something like 'promiscuous use of dirty needles'; and I realized that an analogy to risky sex was not far wrong. Or, at any rate, doesn't seem far wrong to me. But your measure (the one that you mention, anyway) does not distinguish between 1 act each with 100 risky partners, and 100 acts with one. Anyway, one way to describe the groups is to have some experts place the reports of behaviors into 'risk-groups'. Or assign the risk scores. Assuming that those scores do describe your sample, without great non-normality, you should be able to use averages of risk-scores for a technical level of testing and reporting, and convert them back to the verbal anchor-descriptions in order to explain what they mean. [ ...Q about zero; kurtosis.] RU > > > > Categorizing the values into a few categories labeled, > > "none, almost none, ...." is one way to convert your scores. > > If those labels do make sense. > > Makes sense at the low end 0 risk. And at the high end I used 90+ > representing using a dirty needle/syringe once a day or more often. > The 2 middle categories were pretty arbitrary. [ snip, other procedures ] > One of the other posters asked about the appropriate error term--I > guess that lies at the heart of my inquiry. I have no idea what the > appropriate error term would be, and to best model such data. I often > deal with similar response variables that have distributions in which > observations are clustered at 1 or both ends of the continuum. In > most cases, these distributions are not even approximately unimodal > and a bit skewed--variables for which normalizing power > transformations make sense. Additionally, these typically aren't > outcomes that could be thought of as being generated by a gaussian > process. Can you describe them usefully? What is the shape of the behaviors that you observe or expect, corresponding to the drop-off of density near either extreme? > In some cases I think it makes sense to consider poisson and > generalizations of poisson processes although there is clearly much > greater between subject heterogeneity than assumed by a poisson > process. I estimated poission and negative binomial regression > models--there was compelling evidence that the poission was > overdispersed. I also used a Vuong statistic to compare NB regression [.... snip, more detail ] > I think a lot of folks just run standard analyses or arbitrarily apply > some "normalizing" transformation because that's whats done in their > field. Then report the results without really examining the > underlying distributions. I'm curious how folks procede when they > encounter very goofy distrubions. Thanks for your comments. Most folks don't know of anything more than ranks, or they would be afraid to try to justify it. IF there is not something 'done in their field'. Actually, the precedent of the field ought to be one of the first things tried. Agresti has the example with groups formed on reported alcohol consumption. In his "Introduction to categorical analyses", he shows alternative analyses where the categories are scored with the mid-range of consumption; ranks; and 0-5. Because of the group sizes, using ranks proved to be the weakest analysis. (I don't know how a logit analysis fares.) - Now, I have learned to be a bit allergic to using zero as 'none' and an equal step from 1 in the expected direction. If I remember right, I found a stronger test on his example when I omitted that group. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =================================================================