On 6/27/07, Eldon Eller <[EMAIL PROTECTED]> wrote:

Argument by reductio ad absurdum: Consider the standard deviation of a
data set consisting of a single point. Using N for the denominator gives
a standard deviation of zero, which is correct only if that point is the
entire population. Using N-1 gives a standard deviation of 0%0, which is


Not in J.  0 div 0 is 0.

And EEMCD is correct, as well as Rieboucinski.

You ALL miss several salient points, the least of which is that you cannot
HAVE
a normal WITHOUT a deviation.

The metrics here do not exist in nature; they are artifice of man.

PICK one and USE it; then make sure the meaning you invest it with IS there.

A standardized  metric for deviation is VERY useful, and extraordinarily
difficult to come up with
across multiple populations.

BTW< I usualy do not weight by _probability_; UI weight by _frequency_.

But that is because I keep probability and statistics distinct (estimation
versus counting);
even if thedomains and ranges of these two fields are 1:1 onto.



indeterminate, and is correct if that point is a sample from a larger
population.

Raul Miller wrote:
> I've been thinking a bit more about standard deviation -- it's been
> quite a while since I've done anything much at this level, so I'm
> still going over the basics.
>
> Anyways, I still have not been convinced to abandon my concept
> of "substandard deviation" in favor of "standard deviation".
>
> For one thing, the reasoning for using N-1 in the denominator of
> standard deviation seems specious.
>
> For another, in most cases where it really matters (correlation, for
> example), it just cancels out.
>
> For another, "standard deviation" jumps from a manageable quantity to
> an unknowable quantity when the result is determined, and it seems
> to me that it should be zero in that case.
>
> For another, my "qualitative justification" (that it's accounting for
> some
> unknown uncertainty in the mean) seems wrong for cases where we
> have a well understood model.
>
> To my mind, the reasoning for the N-1 factor in standard deviation
> may be validly applied to the mean -- if I include all the deviation
> terms AND the deviation of the mean itself from itself, I should
> exclude the count of the mean in the divisor.  But no one bothers
> to express it that way, so this seems an exercise in futility.
>
> Anyways, the traditional definition for mean in J is +/ % # -- and this
> is a nice definition.  However, statistically speaking that's taking
> advantage
> of a quirk -- that each chance is equally weighted. A more general
> definition of mean would be +/@:* where one argument is potential
> value and the other argument is its corresponding probability (and [EMAIL 
PROTECTED]
> is the probability we use where each potential value has the same
> chance).
>
> When I take this more accurate concept of mean and try and apply it
> to the concept of standard deviation, I run into a problem -- my
> probabilities for that part of the standard deviation equation do not
> add up to 1.
>
> Of course, you can argue that each of the deviations corresponds to
> an actual test, and thus each has equal probability, so the concept
> behind "standard deviation" is justified.  But the concept that one
> value is not random and this concept (that all deviations have equal
> probability) contradict each other.
>
> Also, if I am dealing with real probability, I don't just tweak the
> denominator for the case where some case is known.  I entirely
> remove that value from the sequence I'm determining the mean of,
> and handle it separately.  [And, again, this doesn't seem like a
> valid justification for changing N to N-1 in standard deviation.]
>
> So... I still have not been talked out my preference for using SD
> (Substandard Deviation) in place of stddev (standard deviation)
> when I'm messing around with statistics.
>
> And, because of the behavior of 'stddev' and 'SD' for contexts where
> the answer is completely determined, I feel that SD is a better
> metric when the number of samples is small.
>
> I do, however, recognize the value of stddev when dealing with other
> people's summaries (where they have used standard deviation and
> I can't get at the raw data).  Standards can be important for
> communication even when they don't have some other value.
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm




--
Use Reply-To: & thread your email
after the first: or it may take a while, as
I get 2000 emails per day.
--

Roy A. Crabtree
UNC '76 gaa.lifer#
(For TN contact, email me to set up a confirmed date/time)
202-391-0765 voicemail inbound only

[When you hear/read/see/feel what a yehudi plays/writes/sculpts/holds]
[(n)either violinist {Menuhin} (n)or writer {"The Yehudi Principle"} (n)or
molder (n)or older]
[you must strive/think/look/sense all of it, or you will miss the meanings
of it all]

[EMAIL PROTECTED] Forwards only to:
[EMAIL PROTECTED] CC: auto to:
[EMAIL PROTECTED] Be short < 160 chars cuts off; currently
offline
[EMAIL PROTECTED] CC: auto to ^

http://www.authorsden.com/royacrabtree
http://skyscraper.fortunecity.com/activex/720/resume/full.doc
--
(c) RAC/IP, ARE,PRO,PAST
(Copyright) Roy Andrew Crabtree/In Perpetuity
   All Rights/Reserved Explicitly
   Public Reuse Only
   Profits Always Safe Traded
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to