aglucks wrote:
> 
> I was always taught that at least 3 (three) data points are needed to
> calculate a statistically valid, meaningful standard deviation.
> 
> What is the value of a standard deviation that is calculated using
> only 2 data points?

        It depends which meaning of "value" you have in mind.


        The numerical value is  |x1 - x2| / sqrt(2)  


        The practical value is practically nothing - as is the case for n=3, 4,
... to a diminishing extent.  See the following simulations
(in MINITAB, my fave stats-hacking tool)

        #anything after a # is a comment!
        #anything after a MTB > is what I typed
        #MINITAB does the rest automagically

MTB > rand 1000 c1 c2    #1000 samples of size 2 (normal, mean 0, SD 1)
MTB > rstdev c1-c2 c10   #Estimate the SD from the sample
MTB > boxp c10           #boxplot
        

Boxplot

      -------------
 -----I    +      I----------------  ******     *** 
      -------------
 +---------+---------+---------+---------+---------+------C10     
0.00      0.70      1.40      2.10      2.80      3.50



MTB > rand 1000 c1-c3           #size 3
MTB > rstdev c1-c3 c10
MTB > boxp c10
        

Boxplot


              --------------
    ----------I     +      I-------------------*** 
              --------------
   +---------+---------+---------+---------+---------+------C10     
  0.00      0.50      1.00      1.50      2.00      2.50

MTB > rand 1000 c1-c4           #size 4
MTB > rstdev c1-c4 c10
MTB > boxp c10
        

Boxplot


                 ---------------
    -------------I     +       I------------------  * * ** 
                 ---------------
   --------+---------+---------+---------+---------+--------C10     
        0.40      0.80      1.20      1.60      2.00


        Note that in every case half the computed standard
deviations lie outside the "box". Thus, for 

n=2     half the values 1/3 or greater than 4/3 the true  
        are less than                           value  
n=3                     1/2               5/4

n=4                     2/3               6/5

        Not good for much, really.  It looks better by the time
you get up to (say) n=10:

MTB > rand 1000 c1-c10          #size 10
MTB > rstdev c1-c10 c15
MTB > boxp c15
        

Boxplot

                        -------------
     *------------------I     +     I------------------*** 
                        -------------
   --------+---------+---------+---------+---------+--------C15     
         0.50      0.75      1.00      1.25      1.50


and by n=30 you're within +- 10% half the time.

MTB > rand 1000 c1-c30
MTB > rstdev c1-c30 c35
MTB > boxp c35
        

Boxplot


                          ----------
          ** *------------I    +   I------------- *    * 
                          ----------
   --------+---------+---------+---------+---------+--------C35     
         0.60      0.80      1.00      1.20      1.40


        So nothing magical happens when n=3; you *can* compute
SD for n=2 and it won't be much good till about n=10.


                -Robert Dawson
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to