On Sun, 13 Oct 2002, Maleck Kcelam wrote: > Dear sir or madam, I was making an experiment and I have a small > problem to write my final results. I've counted grain sizes in a metal > sample using a specific software and I obtained the following data: [re-formatted for compactness -- DFB]
measure mean variance st.dev. C.V. min. max. N > 1 3.6 22.3 4.7 1.32 1.1 88.5 4376 > 2 4.3 18.3 4.3 0.98 1.5 96.9 4151 > So, basically, I have three questions: > > 1) Usually I would have written (average +/- standard deviation) but > the standard deviation is superior than my average ! How can I > represent my final result or what should I say because I did several > other measurements and they are quite the same? Evidently the distribution of your measurements is highly skewed; it follows that the sample mean cannot be trusted as a measure of expectation, and the sample variance is highly inflated by the few very large values (maximum about 90, minimum about 1 to 1.5, mean about 4); which is why st.dev. > average for both data sets. Display your distributions, either as dotplots or frequency tables (or possibly stem-&-leaf diagrams, but these are likely to be cumbersome with such large sample sizes). Investigate the very large values: are they real? Do they belong with the bulk of your observations? If "yes" to both questions, consider (a) reporting order statistics instead of average and s.d. (median, quartiles, possible 10th and 90th percentiles) so as to show the shape of the distribution you're trying to summarize (and if you like to report results in diagrams, a pair of box plots, one for each data set, would be informative to your readers); (b) taking logarithms of your data values, displaying those distributions, and using them if they're (nearly) symmetrical (in which case mean & s.d. of the log values would be reasonable summary statistics) (and in that case the antilog of the mean logarithm is the geometric mean of the original data). > 2) Can I express my result as (average +/- coefficient of variation)? This does not appear to make any intuitive sense. Why would one? > 3) I need to represent this measures in one single number, so, how > would I unite the two measurements? Which formula should I use? If you report order statistics, all you can do is combine the two data sets into one (N = 8527) and find the median etc. of the combined data. If logarithms give you reasonable distributions, the usual "single number" would be the mean of the combined sample: (N1*(average1) + N2*(average2)) / (N1 + N2). > (mean average of MEASURE 1+ MEASURE 2) +/- (????) The "+/- (????)" does not look to me like a "single number"; looks more like two numbers, an average of some kind +/- a measure of uncertainty. If you end up using means and s.d.s (either because you cast out the very high values as not properly belonging to your data, or because you used a transformation (e.g., log) that made the distribution nearly symmetric), it might be appropriate to use a pooled standard deviation: pooled variance = (var1*(N1-1) + var2*(N2-1)) / (N1 + N2 - 2), pooled s.d. = square root of pooled variance. Not everyone would agree that this is appropriate, however; and knowing nothing about the field in which you're operating, I cannot offer useful opinion on this point. > Could you possibly help me with this problem? Thank you in advance and > I hope to hearing from you soon. ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 [Old address: 184 Nashua Road, Bedford, NH 03110 (603) 471-7128] . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
