I've been thinking a bit more about standard deviation -- it's been
quite a while since I've done anything much at this level, so I'm
still going over the basics.

Anyways, I still have not been convinced to abandon my concept
of "substandard deviation" in favor of "standard deviation".

For one thing, the reasoning for using N-1 in the denominator of
standard deviation seems specious.

For another, in most cases where it really matters (correlation, for
example), it just cancels out.

For another, "standard deviation" jumps from a manageable quantity to
an unknowable quantity when the result is determined, and it seems
to me that it should be zero in that case.

For another, my "qualitative justification" (that it's accounting for some
unknown uncertainty in the mean) seems wrong for cases where we
have a well understood model.

To my mind, the reasoning for the N-1 factor in standard deviation
may be validly applied to the mean -- if I include all the deviation
terms AND the deviation of the mean itself from itself, I should
exclude the count of the mean in the divisor.  But no one bothers
to express it that way, so this seems an exercise in futility.

Anyways, the traditional definition for mean in J is +/ % # -- and this
is a nice definition.  However, statistically speaking that's taking advantage
of a quirk -- that each chance is equally weighted. A more general
definition of mean would be +/@:* where one argument is potential
value and the other argument is its corresponding probability (and [EMAIL 
PROTECTED]
is the probability we use where each potential value has the same
chance).

When I take this more accurate concept of mean and try and apply it
to the concept of standard deviation, I run into a problem -- my
probabilities for that part of the standard deviation equation do not
add up to 1.

Of course, you can argue that each of the deviations corresponds to
an actual test, and thus each has equal probability, so the concept
behind "standard deviation" is justified.  But the concept that one
value is not random and this concept (that all deviations have equal
probability) contradict each other.

Also, if I am dealing with real probability, I don't just tweak the
denominator for the case where some case is known.  I entirely
remove that value from the sequence I'm determining the mean of,
and handle it separately.  [And, again, this doesn't seem like a
valid justification for changing N to N-1 in standard deviation.]

So... I still have not been talked out my preference for using SD
(Substandard Deviation) in place of stddev (standard deviation)
when I'm messing around with statistics.

And, because of the behavior of 'stddev' and 'SD' for contexts where
the answer is completely determined, I feel that SD is a better
metric when the number of samples is small.

I do, however, recognize the value of stddev when dealing with other
people's summaries (where they have used standard deviation and
I can't get at the raw data).  Standards can be important for
communication even when they don't have some other value.

--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to