OK, I'll give more info on what I am doing. But first, thanks Don & Dennis
for your input thus far!

I have a data file of effect sizes (d) from several hundred neuroimaging
(MRI) studies of the brain in Alzheimer's disease. These are on about 100
different structures & regions of the brain. Each effect size is for the
volume difference between patients and healthy comparison subjects in the
published studies. The studies vary widely in which brain sites are
reported. Furthermore, some brain regions have lots of studies (50+) while
other brain regions are only scantily studied (5 or fewer studies). I want
to display the effect sizes in a table, arranged according to brain
regions -- i.e., row labels will be brain structures; column headers will
be stats (N, mean, median, etc.). I'm using SPSS frequencies (with the
"notable" option and pivoting the x-y axes from the default output). The
expected readers will include graduate students, neuroimaging researchers,
and clinicians specializing in this disorder -- who would have at least a
working familiarity with common descriptive stats, questions of normalcy,
outlier effects, and effect sizes -- but would probably find much more
than that uninteresting, I would guess.

I see the value of a graphic display of distributions; however, given the
large number of measures, it does not seem practical (overwhelming).

Fortunately, effect sizes for the majority of measures closely approximate
a normal distribution (absolute value of the skew & kurt well below 1.0).
However, enough measures are skewed that it would be misleading to only
report the Means and SDs of their effect sizes (as you might expect, these
tend to be the measures for which the N of studies is relatively small).
In view of all this, the Median appears to be the best stat to use for
listing the "average" effect sizes on each measure, so I will put that on
my table as the main effect size measure -->  the Median is clearly best
for skewed measures, and if measures have a nearly normal distribution the
Median and Mean are virtually identical anyway. I will list the Mean right
after the mean, which will help people see if a particular measure is
skewed -- when the Median & Mean are discrepant.

I am more hesitant about what indices of central-tendency/dispersion to
use. I will list the SD right after the Mean, since it is an appropriate
index in many cases. My co-author wants to include Min & Max values,
because he is writing a discussion about the very high heterogeneity of
effect sizes among studies.  I am OK with that, but tend to lean more
towards using some specific percentiles, because these are less wildly
influenced by extreme but relatively rare outliers.

Based on your comments, I will abandon the idea of trying to use
percentiles that are somehow comparable to the SD. The 90% confidence
interval is popular (5th and 95th percentile). I tried it out, and it
works well when the N is large. As you would expect, when the N is small
(12 or less), these percentiles are virtually identical to the Min and Max
-- and just as prone to being overly influenced by single outliers. That
leads me to consider a narrower interval, such as the inter-quartile range
(25th and 75th percentile) -- which works well down to N's of about 6.

Of course, I could list all of them: N Median Mean SD Min 5th 25th 75th
95th Max -- but that seems overkill.

Any suggestion would be appreciated.

John

P.S. Does "semi-interquartile range" refer to the same thing as
"interquartile range"?

==================================
From: [EMAIL PROTECTED] (Donald Burrill)

You haven't said anything about your audience/readership.  I (for one!)
would have different recommendations for a report intended for general
public consumption (e.g., parents of schoolchildren) vs. a technical
report aimed at statistical experts of one stripe or another.  (For the
former, skewness and kurtosis would indeed be overkill, as you suggest;
and even for the latter, it is not clear to me that those values convey
much useful information, possibly apart from suggesting the degree to
which the distribution(s) in question depart from "normality" (aka a
Gaussian distributional model).)

To answer your question, I'd recommend one-line box-&-whisker plots
(especially if you're displaying anything in graphical form), or
equivalently five-number summaries (extremes, quartiles, median in their
order of magnitude).  But this is a sort of out-of-the-blue answer, and
does not take into account any characteristics either of your
audience(s) or of your reasons for wanting to report dispersions (apart,
perhaps, from a sense of clerical completeness...).

=================================
From: [EMAIL PROTECTED] (Dennis Roberts)

as don suggested, it is hard to know how to advise if we don't know the
level of the audience

but, as a general rule ... NONE of these summary statistics really tell
you
about the distribution so, my first question is: can you show a picture of

the distributions ... like dotplots?

don't worry about trying to find some variability measure that goes along
with the median ... while the semi interquartile range is sometimes used
... there is nothing inherently connected between it and the median ...




.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to