I've been thinking a bit more about standard deviation -- it's been quite a while since I've done anything much at this level, so I'm still going over the basics.
Anyways, I still have not been convinced to abandon my concept of "substandard deviation" in favor of "standard deviation". For one thing, the reasoning for using N-1 in the denominator of standard deviation seems specious. For another, in most cases where it really matters (correlation, for example), it just cancels out. For another, "standard deviation" jumps from a manageable quantity to an unknowable quantity when the result is determined, and it seems to me that it should be zero in that case. For another, my "qualitative justification" (that it's accounting for some unknown uncertainty in the mean) seems wrong for cases where we have a well understood model. To my mind, the reasoning for the N-1 factor in standard deviation may be validly applied to the mean -- if I include all the deviation terms AND the deviation of the mean itself from itself, I should exclude the count of the mean in the divisor. But no one bothers to express it that way, so this seems an exercise in futility. Anyways, the traditional definition for mean in J is +/ % # -- and this is a nice definition. However, statistically speaking that's taking advantage of a quirk -- that each chance is equally weighted. A more general definition of mean would be +/@:* where one argument is potential value and the other argument is its corresponding probability (and [EMAIL PROTECTED] is the probability we use where each potential value has the same chance). When I take this more accurate concept of mean and try and apply it to the concept of standard deviation, I run into a problem -- my probabilities for that part of the standard deviation equation do not add up to 1. Of course, you can argue that each of the deviations corresponds to an actual test, and thus each has equal probability, so the concept behind "standard deviation" is justified. But the concept that one value is not random and this concept (that all deviations have equal probability) contradict each other. Also, if I am dealing with real probability, I don't just tweak the denominator for the case where some case is known. I entirely remove that value from the sequence I'm determining the mean of, and handle it separately. [And, again, this doesn't seem like a valid justification for changing N to N-1 in standard deviation.] So... I still have not been talked out my preference for using SD (Substandard Deviation) in place of stddev (standard deviation) when I'm messing around with statistics. And, because of the behavior of 'stddev' and 'SD' for contexts where the answer is completely determined, I feel that SD is a better metric when the number of samples is small. I do, however, recognize the value of stddev when dealing with other people's summaries (where they have used standard deviation and I can't get at the raw data). Standards can be important for communication even when they don't have some other value. -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
