Juan Zuluaga wrote:
> 
> ---------- Forwarded Message ----------
> Date: Wednesday, November 08, 2000 3:57 PM -0500
> From: Greg Adams <[EMAIL PROTECTED]>
> 
> Subject: important:  election results
> 
> As you probably all know, Bush has 1700 more votes in Florida over Gore.
> However, folks in Palm Beach were complaining that their ballots were
> confusing, and many people voted for Buchanan when they thought they were
> voting for Gore.  With the help of my wife Chris, I analyzed the county by
> county presidential results for Florida.  The results are clear:  the ballot
> for Palm Beach cost Gore approximately 2200 votes.  A simple regression of
> Buchanan's vote on Bush's vote shows that Buchanan should have only gotten
> 800 votes, not 3400.
> 
> Don't believe me?  Look for yourself:  It's not even close!  Palm Beach is
> an outlier beyond all belief!!!


        Hoooold on. Look at the pictures again:

http://madison.hss.cmu.edu/palm-beach.pdf

        THESE DATA ARE NOT NORMALLY DISTRIBUTED. Nor does the assumption of
homoscedasticity even begin to apply. Moreover, the "Bush" and "Gore"
numbers are mostly proxies for county size, which is roughly
logsymmetric. Doing least-squares regression on these data is just
meaningless.



Histogram of Pop   N = 67

Midpoint        Count
       0           30  ******************************
   50000           15  ***************
  100000            8  ********
  150000            4  ****
  200000            3  ***
  250000            1  *
  300000            1  *
  350000            1  *
  400000            1  *
  450000            1  *
  500000            0
  550000            1  *
  600000            1  *


Histogram of logPop   N = 67

Midpoint        Count
     8.0            4  ****
     8.5            9  *********
     9.0            9  *********
     9.5            4  ****
    10.0            6  ******
    10.5            2  **
    11.0           12  ************
    11.5            7  *******
    12.0            6  ******
    12.5            4  ****
    13.0            3  ***
    13.5            1  *



If we use _proportion_ of the vote per county, we get much more
plausible looking distributions for Bush and Gore:

Histogram


Histogram of Gorepro   N = 67

Midpoint        Count
    0.25            3  ***
    0.30            5  *****
    0.35            9  *********
    0.40           12  ************
    0.45           20  ********************
    0.50            9  *********
    0.55            4  ****
    0.60            2  **
    0.65            2  **
    0.70            1  *

MTB > hist c36

Histogram


Histogram of Bushpro   N = 67

Midpoint        Count
    0.30            1  *
    0.35            2  **
    0.40            2  **
    0.45            5  *****
    0.50           10  **********
    0.55           19  *******************
    0.60           12  ************
    0.65            8  ********
    0.70            5  *****
    0.75            3  ***

MTB > hist c37


but the proportions for Pat Buchanan are still far from normal. (Should
I rephrase that? Naaaah.) 

Histogram of PBpro   N = 67

Midpoint        Count
   0.000            1  *
   0.002           23  ***********************
   0.004           20  ********************
   0.006           13  *************
   0.008            4  ****
   0.010            2  **
   0.012            2  **
   0.014            0
   0.016            1  *
   0.018            1  *

By the way, those two high-tail points are NOT Palm Beach, but the
(small) Calhoun and Liberty) counties.  If we log-transform we get a
reasonably symmetric distribution:

Histogram of logPBpro   N = 67

Midpoint        Count
    -7.2            1  *
    -6.8            1  *
    -6.4            8  ********
    -6.0           14  **************
    -5.6           15  ***************
    -5.2           17  *****************
    -4.8            6  ******
    -4.4            3  ***
    -4.0            2  **


We plot *that* against the proportion of Bush or Gore votes (essentially
equivalent)
and we get (Palm Beach marked with a "P"


MTB > plot c39*c36

Plot


     -4.0+                                 *
         -                                 *
logPBpro -                                     *    *
         -                                                 *
         -        P                                 *  *    *
     -5.0+                               ** *  *   *      *      *
         -                   *        * * 2*        *
         -                         2  * *  *     **  *   *
         -                       *              *       * * *      *
         -           *   *      *        2 * ***                 *
     -6.0+    *                  2   *  **    *
         -                    *        *       *
         -                    *   *        *   *
         -  *                      *       *             *
         -
     -7.0+                     *
         -
          
--+---------+---------+---------+---------+---------+----Bushpro 
         0.320     0.400     0.480     0.560     0.640     0.720

And regress:

The regression equation is
logPBpro = - 6.80 + 2.25 Bushpro

Predictor        Coef       StDev          T        P
Constant      -6.8005      0.4440     -15.31    0.000
Bushpro        2.2484      0.7821       2.88    0.005

S = 0.5867      R-Sq = 11.3%     R-Sq(adj) = 9.9%

Analysis of Variance

Source            DF          SS          MS         F        P
Regression         1      2.8457      2.8457      8.27    0.005
Residual Error    65     22.3777      0.3443
Total             66     25.2234

Unusual Observations
Obs    Bushpro        C39         Fit   StDev Fit    Residual    St
Resid
  6      0.314    -6.5731     -6.0945      0.2056     -0.4786      
-0.87 X
  7      0.561    -4.0407     -5.5384      0.0717      1.4977       
2.57R 
 11      0.668    -6.6086     -5.2985      0.1106     -1.3101      
-2.27R 
 13      0.468    -7.0057     -5.7486      0.1018     -1.2570      
-2.18R 
 20      0.331    -5.9082     -6.0565      0.1932      0.1483       
0.27 X
 39      0.556    -4.1054     -5.5499      0.0718      1.4445       
2.48R 
 [PalmBeach is #50]
 50      0.359    -4.8267     -5.9923      0.1727      1.1656       
2.08R 

R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.

Nothing much going on here: 2 SD's in a slightly heavy-tailed
distribution, and by no means the heaviest outlier. Residual plot (Palm
Beach again indicated:)


 C40     -
         -                                 2
         -        P
      1.0+                                     *
         -                                          *
         -                   *           *             *   **
         -                         *  * * 2**  *   **
         -    *          *       * *  * * **     ** *     *
      0.0+           *          *               *    *           *
         -                       2       *   *          ***
         -  *                 *      *  *2 *  **            *      *
         -                    *        *      **                 *
         -                        **       *
     -1.0+                                 *   *
         -                     *
         -                                               *
         -
          
--+---------+---------+---------+---------+---------+----Bushpro 
           0.320     0.400     0.480     0.560     0.640     0.720



More interesting, however, is if we regress the transformed Buchanan
proportion on log of county size *and* the main vote split. Not only
does the R-squared rise dramatically,
indicating that Buchanan has mainly rural support:




MTB > plot c39*c18

Plot


     -4.0+            *
         -      *
 C39     -              * *
         -               *
         -                2  *                             *
     -5.0+          22          **       *
         -            **  *  *    * *          *
         -              2* *     *        2     * *
         -      *       *          *       2*
         -        *       *         * *   ** ***   *
     -6.0+                     *          *     *    * * **
         -                               *         **
         -                                 * *      *  *
         -                            *       *  *           *
         -
     -7.0+                                                    *
         -
          
+---------+---------+---------+---------+---------+------logPop  
         7.2       8.4       9.6      10.8      12.0      13.2






 but Palm Beach suddently pops off the charts as an outlier:


MTB > regress c39 2 c36 c18;
SUBC> resids c40.

Regression Analysis


The regression equation is
logPBpro = - 3.39 + 0.828 Bushpro - 0.252 logPop

Predictor        Coef       StDev          T        P
Constant      -3.3894      0.6491      -5.22    0.000
Bushpro        0.8281      0.6610       1.25    0.215
logPop       -0.25208     0.04028      -6.26    0.000

S = 0.4657      R-Sq = 45.0%     R-Sq(adj) = 43.2%

Analysis of Variance

Source            DF          SS          MS         F        P
Regression         2     11.3407      5.6704     26.14    0.000
Residual Error    64     13.8826      0.2169
Total             66     25.2234

Source       DF      Seq SS
Bushpro       1      2.8457
logPop        1      8.4951

Unusual Observations
Obs    Bushpro        C39         Fit   StDev Fit    Residual    St
Resid
  6      0.314    -6.5731     -6.4678      0.1737     -0.1052      
-0.24 X
  7      0.561    -4.0407     -5.0774      0.0931      1.0367       
2.27R 
 20      0.331    -5.9082     -5.5282      0.1751     -0.3800      
-0.88 X
 22      0.563    -5.8950     -4.9631      0.1077     -0.9319      
-2.06R 
 44      0.493    -6.5416     -5.6008      0.0722     -0.9408      
-2.04R 
 50      0.359    -4.8267     -6.3587      0.1490      1.5320       
3.47R 

R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.


MTB > plot c40*c18  #residuals against log of population

Plot


     1.60+                                                 *
         -
 C40     -
         -
         -            *
     0.80+      *
         -              * *                    *
         -               *   *           *        *
         -                2     **  *     *     *         *
         -           *       *    *       ** * *   * *   *
     0.00+          2* *  *      *         ** *     *  *
         -            * ** *       **     2*    *            *
         -              *      *      *            *   *
         -              *                *          *
         -      *                          * *   *            *
    -0.80+                *
         -        *                   *       *
          
+---------+---------+---------+---------+---------+------logPop  
         7.2       8.4       9.6      10.8      12.0      13.2

Histogram


Histogram of C40   N = 67   #residuals of logPBpro against logpop and
Bushpro

Midpoint        Count
    -1.0            2  **
    -0.8            3  ***
    -0.6            5  *****
    -0.4            4  ****
    -0.2           14  **************
     0.0           13  *************
     0.2           12  ************
     0.4            7  *******
     0.6            4  ****
     0.8            1  *
     1.0            1  *
     1.2            0
     1.4            0
     1.6            1  *


        *This*, I think, is evidence that Palm Beach does not fit the pattern;
and the assumptions of normality and heteroscedasticity are well
supported.

We can do the same regression on *only* log population:

MTB > regress c39 1 c18;
SUBC> resids c40
* NOTE * Subcommand does not end in . or ; (; assumed).
SUBC> .

Regression Analysis


The regression equation is
C39 = - 2.75 - 0.269 logPop

Predictor        Coef       StDev          T        P
Constant      -2.7455      0.3983      -6.89    0.000
logPop       -0.26941     0.03800      -7.09    0.000

S = 0.4678      R-Sq = 43.6%     R-Sq(adj) = 42.7%

Analysis of Variance

Source            DF          SS          MS         F        P
Regression         1      11.000      11.000     50.27    0.000
Residual Error    65      14.223       0.219
Total             66      25.223

Unusual Observations
Obs     logPop        C39         Fit   StDev Fit    Residual    St
Resid
  7        8.5    -4.0407     -5.0464      0.0901      1.0057       
2.19R 
 22        8.1    -5.8950     -4.9257      0.1039     -0.9694      
-2.13R 
 44       10.4    -6.5416     -5.5452      0.0572     -0.9964      
-2.15R 
 50       13.0    -4.8267     -6.2372      0.1137      1.4104       
3.11R 

MTB > plot c40*c18  #residuals of logPBpro against logpop

Plot


         -                                                 *
 C40     -
         -
         -            *
     0.80+      *
         -              ***              *     *
         -                *  *                  * *
         -                *     **  *     2        *      *
         -           *            *        2* *      * * *
     0.00+          **    *  *   *        *  * *
         -          * **** *       *       *    *   *
         -              2           * *   *        **  *     *
         -                               *
         -      *              *           * *   *            *
    -0.80+                *                   *
         -        *                   *
         -
          
+---------+---------+---------+---------+---------+------logPop  
         7.2       8.4       9.6      10.8      12.0      13.2


Just out of interest, IF we assume that the residual arises from ballot
errors, how many votes does it represent? A factor of exp(1.41) = 4.1
would suggest that 75% of the 3407
Buchanan votes were not intended, so that about 2500 were intended for
Gore. 


        Yrs aye,
                Robert Dawson


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to