OK, I'll give more info on what I am doing. But first, thanks Don & Dennis for your input thus far!
I have a data file of effect sizes (d) from several hundred neuroimaging (MRI) studies of the brain in Alzheimer's disease. These are on about 100 different structures & regions of the brain. Each effect size is for the volume difference between patients and healthy comparison subjects in the published studies. The studies vary widely in which brain sites are reported. Furthermore, some brain regions have lots of studies (50+) while other brain regions are only scantily studied (5 or fewer studies). I want to display the effect sizes in a table, arranged according to brain regions -- i.e., row labels will be brain structures; column headers will be stats (N, mean, median, etc.). I'm using SPSS frequencies (with the "notable" option and pivoting the x-y axes from the default output). The expected readers will include graduate students, neuroimaging researchers, and clinicians specializing in this disorder -- who would have at least a working familiarity with common descriptive stats, questions of normalcy, outlier effects, and effect sizes -- but would probably find much more than that uninteresting, I would guess. I see the value of a graphic display of distributions; however, given the large number of measures, it does not seem practical (overwhelming). Fortunately, effect sizes for the majority of measures closely approximate a normal distribution (absolute value of the skew & kurt well below 1.0). However, enough measures are skewed that it would be misleading to only report the Means and SDs of their effect sizes (as you might expect, these tend to be the measures for which the N of studies is relatively small). In view of all this, the Median appears to be the best stat to use for listing the "average" effect sizes on each measure, so I will put that on my table as the main effect size measure --> the Median is clearly best for skewed measures, and if measures have a nearly normal distribution the Median and Mean are virtually identical anyway. I will list the Mean right after the mean, which will help people see if a particular measure is skewed -- when the Median & Mean are discrepant. I am more hesitant about what indices of central-tendency/dispersion to use. I will list the SD right after the Mean, since it is an appropriate index in many cases. My co-author wants to include Min & Max values, because he is writing a discussion about the very high heterogeneity of effect sizes among studies. I am OK with that, but tend to lean more towards using some specific percentiles, because these are less wildly influenced by extreme but relatively rare outliers. Based on your comments, I will abandon the idea of trying to use percentiles that are somehow comparable to the SD. The 90% confidence interval is popular (5th and 95th percentile). I tried it out, and it works well when the N is large. As you would expect, when the N is small (12 or less), these percentiles are virtually identical to the Min and Max -- and just as prone to being overly influenced by single outliers. That leads me to consider a narrower interval, such as the inter-quartile range (25th and 75th percentile) -- which works well down to N's of about 6. Of course, I could list all of them: N Median Mean SD Min 5th 25th 75th 95th Max -- but that seems overkill. Any suggestion would be appreciated. John P.S. Does "semi-interquartile range" refer to the same thing as "interquartile range"? ================================== From: [EMAIL PROTECTED] (Donald Burrill) You haven't said anything about your audience/readership. I (for one!) would have different recommendations for a report intended for general public consumption (e.g., parents of schoolchildren) vs. a technical report aimed at statistical experts of one stripe or another. (For the former, skewness and kurtosis would indeed be overkill, as you suggest; and even for the latter, it is not clear to me that those values convey much useful information, possibly apart from suggesting the degree to which the distribution(s) in question depart from "normality" (aka a Gaussian distributional model).) To answer your question, I'd recommend one-line box-&-whisker plots (especially if you're displaying anything in graphical form), or equivalently five-number summaries (extremes, quartiles, median in their order of magnitude). But this is a sort of out-of-the-blue answer, and does not take into account any characteristics either of your audience(s) or of your reasons for wanting to report dispersions (apart, perhaps, from a sense of clerical completeness...). ================================= From: [EMAIL PROTECTED] (Dennis Roberts) as don suggested, it is hard to know how to advise if we don't know the level of the audience but, as a general rule ... NONE of these summary statistics really tell you about the distribution so, my first question is: can you show a picture of the distributions ... like dotplots? don't worry about trying to find some variability measure that goes along with the median ... while the semi interquartile range is sometimes used ... there is nothing inherently connected between it and the median ... . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
