I have enjoyed the comments I read on this. I want to point to the a couple of addition conclusions that are possible, concerning these summaries of raw data -
> On Sun, 13 Oct 2002, Maleck Kcelam wrote: > > > Dear sir or madam, I was making an experiment and I have a small > > problem to write my final results. I've counted grain sizes in a metal > > sample using a specific software and I obtained the following data: On 13 Oct 2002 14:43:16 -0700, [EMAIL PROTECTED] (Donald Burrill) wrote: - I am citing DB, for his improvement of the text - > [re-formatted for compactness -- DFB] > > measure mean variance st.dev. C.V. min. max. N > > 1 3.6 22.3 4.7 1.32 1.1 88.5 4376 > > 2 4.3 18.3 4.3 0.98 1.5 96.9 4151 > 1) mean=3.6, max= 88.5, SD= 4.72. The z-score of the max is 18 -- an extreme I've seldom seen. Since the "total variance" of z-scores, referring to the "total Sum of squares around the mean", is equal to the DF=4375, the single Max-value accounts for 18^2 =324, or 324/4375 => 7.4% of the variance. 2) mean= 4.3, max= 96.9, SD= 4.28 The z-score of the max is 21.6 -- even more extreme. Here, z-squared is 468; and 486/4150 = 11.2% of the variance. On the original scales, each sample has at least *one* huge outlier. By the way, there can't be a dozen scores that extreme, because the total SS has to add up. If the original scaling is interesting, one question would be: How small is the mean and SD of the rest? - once you decide to trim-and-describe a handful (how many?) of outliers. Further. The log-transform also will leave big outliers, since the median is *not* midway between the min and max, after transformation. That would take medians of about 9.0, in order to be the geometric mean of 1 and 81 (say). Since the two means are about 4, and the skews are extreme, the medians must be even smaller than 4. It might be that the medians are small enough that the *reciprocal* transformation will yield symmetry. Does any transformation make sense? What is the purpose? You can't use Least squares test-statistics while you have huge outliers. You can't regard the raw mean as an indicator of "central tendency", if that was your intention -- One useful comment on the distribution might be that difference (whatever it is) between the mean and the median. But you can still look at the Mean as a "parameter" if there is a distribution that it might usefully index. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
