Juan Zuluaga wrote:
>
> ---------- Forwarded Message ----------
> Date: Wednesday, November 08, 2000 3:57 PM -0500
> From: Greg Adams <[EMAIL PROTECTED]>
>
> Subject: important: election results
>
> As you probably all know, Bush has 1700 more votes in Florida over Gore.
> However, folks in Palm Beach were complaining that their ballots were
> confusing, and many people voted for Buchanan when they thought they were
> voting for Gore. With the help of my wife Chris, I analyzed the county by
> county presidential results for Florida. The results are clear: the ballot
> for Palm Beach cost Gore approximately 2200 votes. A simple regression of
> Buchanan's vote on Bush's vote shows that Buchanan should have only gotten
> 800 votes, not 3400.
>
> Don't believe me? Look for yourself: It's not even close! Palm Beach is
> an outlier beyond all belief!!!
Hoooold on. Look at the pictures again:
http://madison.hss.cmu.edu/palm-beach.pdf
THESE DATA ARE NOT NORMALLY DISTRIBUTED. Nor does the assumption of
homoscedasticity even begin to apply. Moreover, the "Bush" and "Gore"
numbers are mostly proxies for county size, which is roughly
logsymmetric. Doing least-squares regression on these data is just
meaningless.
Histogram of Pop N = 67
Midpoint Count
0 30 ******************************
50000 15 ***************
100000 8 ********
150000 4 ****
200000 3 ***
250000 1 *
300000 1 *
350000 1 *
400000 1 *
450000 1 *
500000 0
550000 1 *
600000 1 *
Histogram of logPop N = 67
Midpoint Count
8.0 4 ****
8.5 9 *********
9.0 9 *********
9.5 4 ****
10.0 6 ******
10.5 2 **
11.0 12 ************
11.5 7 *******
12.0 6 ******
12.5 4 ****
13.0 3 ***
13.5 1 *
If we use _proportion_ of the vote per county, we get much more
plausible looking distributions for Bush and Gore:
Histogram
Histogram of Gorepro N = 67
Midpoint Count
0.25 3 ***
0.30 5 *****
0.35 9 *********
0.40 12 ************
0.45 20 ********************
0.50 9 *********
0.55 4 ****
0.60 2 **
0.65 2 **
0.70 1 *
MTB > hist c36
Histogram
Histogram of Bushpro N = 67
Midpoint Count
0.30 1 *
0.35 2 **
0.40 2 **
0.45 5 *****
0.50 10 **********
0.55 19 *******************
0.60 12 ************
0.65 8 ********
0.70 5 *****
0.75 3 ***
MTB > hist c37
but the proportions for Pat Buchanan are still far from normal. (Should
I rephrase that? Naaaah.)
Histogram of PBpro N = 67
Midpoint Count
0.000 1 *
0.002 23 ***********************
0.004 20 ********************
0.006 13 *************
0.008 4 ****
0.010 2 **
0.012 2 **
0.014 0
0.016 1 *
0.018 1 *
By the way, those two high-tail points are NOT Palm Beach, but the
(small) Calhoun and Liberty) counties. If we log-transform we get a
reasonably symmetric distribution:
Histogram of logPBpro N = 67
Midpoint Count
-7.2 1 *
-6.8 1 *
-6.4 8 ********
-6.0 14 **************
-5.6 15 ***************
-5.2 17 *****************
-4.8 6 ******
-4.4 3 ***
-4.0 2 **
We plot *that* against the proportion of Bush or Gore votes (essentially
equivalent)
and we get (Palm Beach marked with a "P"
MTB > plot c39*c36
Plot
-4.0+ *
- *
logPBpro - * *
- *
- P * * *
-5.0+ ** * * * * *
- * * * 2* *
- 2 * * * ** * *
- * * * * * *
- * * * 2 * *** *
-6.0+ * 2 * ** *
- * * *
- * * * *
- * * * *
-
-7.0+ *
-
--+---------+---------+---------+---------+---------+----Bushpro
0.320 0.400 0.480 0.560 0.640 0.720
And regress:
The regression equation is
logPBpro = - 6.80 + 2.25 Bushpro
Predictor Coef StDev T P
Constant -6.8005 0.4440 -15.31 0.000
Bushpro 2.2484 0.7821 2.88 0.005
S = 0.5867 R-Sq = 11.3% R-Sq(adj) = 9.9%
Analysis of Variance
Source DF SS MS F P
Regression 1 2.8457 2.8457 8.27 0.005
Residual Error 65 22.3777 0.3443
Total 66 25.2234
Unusual Observations
Obs Bushpro C39 Fit StDev Fit Residual St
Resid
6 0.314 -6.5731 -6.0945 0.2056 -0.4786
-0.87 X
7 0.561 -4.0407 -5.5384 0.0717 1.4977
2.57R
11 0.668 -6.6086 -5.2985 0.1106 -1.3101
-2.27R
13 0.468 -7.0057 -5.7486 0.1018 -1.2570
-2.18R
20 0.331 -5.9082 -6.0565 0.1932 0.1483
0.27 X
39 0.556 -4.1054 -5.5499 0.0718 1.4445
2.48R
[PalmBeach is #50]
50 0.359 -4.8267 -5.9923 0.1727 1.1656
2.08R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Nothing much going on here: 2 SD's in a slightly heavy-tailed
distribution, and by no means the heaviest outlier. Residual plot (Palm
Beach again indicated:)
C40 -
- 2
- P
1.0+ *
- *
- * * * **
- * * * 2** * **
- * * * * * * ** ** * *
0.0+ * * * * *
- 2 * * ***
- * * * *2 * ** * *
- * * ** *
- ** *
-1.0+ * *
- *
- *
-
--+---------+---------+---------+---------+---------+----Bushpro
0.320 0.400 0.480 0.560 0.640 0.720
More interesting, however, is if we regress the transformed Buchanan
proportion on log of county size *and* the main vote split. Not only
does the R-squared rise dramatically,
indicating that Buchanan has mainly rural support:
MTB > plot c39*c18
Plot
-4.0+ *
- *
C39 - * *
- *
- 2 * *
-5.0+ 22 ** *
- ** * * * * *
- 2* * * 2 * *
- * * * 2*
- * * * * ** *** *
-6.0+ * * * * * **
- * **
- * * * *
- * * * *
-
-7.0+ *
-
+---------+---------+---------+---------+---------+------logPop
7.2 8.4 9.6 10.8 12.0 13.2
but Palm Beach suddently pops off the charts as an outlier:
MTB > regress c39 2 c36 c18;
SUBC> resids c40.
Regression Analysis
The regression equation is
logPBpro = - 3.39 + 0.828 Bushpro - 0.252 logPop
Predictor Coef StDev T P
Constant -3.3894 0.6491 -5.22 0.000
Bushpro 0.8281 0.6610 1.25 0.215
logPop -0.25208 0.04028 -6.26 0.000
S = 0.4657 R-Sq = 45.0% R-Sq(adj) = 43.2%
Analysis of Variance
Source DF SS MS F P
Regression 2 11.3407 5.6704 26.14 0.000
Residual Error 64 13.8826 0.2169
Total 66 25.2234
Source DF Seq SS
Bushpro 1 2.8457
logPop 1 8.4951
Unusual Observations
Obs Bushpro C39 Fit StDev Fit Residual St
Resid
6 0.314 -6.5731 -6.4678 0.1737 -0.1052
-0.24 X
7 0.561 -4.0407 -5.0774 0.0931 1.0367
2.27R
20 0.331 -5.9082 -5.5282 0.1751 -0.3800
-0.88 X
22 0.563 -5.8950 -4.9631 0.1077 -0.9319
-2.06R
44 0.493 -6.5416 -5.6008 0.0722 -0.9408
-2.04R
50 0.359 -4.8267 -6.3587 0.1490 1.5320
3.47R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
MTB > plot c40*c18 #residuals against log of population
Plot
1.60+ *
-
C40 -
-
- *
0.80+ *
- * * *
- * * * *
- 2 ** * * * *
- * * * ** * * * * *
0.00+ 2* * * * ** * * *
- * ** * ** 2* * *
- * * * * *
- * * *
- * * * * *
-0.80+ *
- * * *
+---------+---------+---------+---------+---------+------logPop
7.2 8.4 9.6 10.8 12.0 13.2
Histogram
Histogram of C40 N = 67 #residuals of logPBpro against logpop and
Bushpro
Midpoint Count
-1.0 2 **
-0.8 3 ***
-0.6 5 *****
-0.4 4 ****
-0.2 14 **************
0.0 13 *************
0.2 12 ************
0.4 7 *******
0.6 4 ****
0.8 1 *
1.0 1 *
1.2 0
1.4 0
1.6 1 *
*This*, I think, is evidence that Palm Beach does not fit the pattern;
and the assumptions of normality and heteroscedasticity are well
supported.
We can do the same regression on *only* log population:
MTB > regress c39 1 c18;
SUBC> resids c40
* NOTE * Subcommand does not end in . or ; (; assumed).
SUBC> .
Regression Analysis
The regression equation is
C39 = - 2.75 - 0.269 logPop
Predictor Coef StDev T P
Constant -2.7455 0.3983 -6.89 0.000
logPop -0.26941 0.03800 -7.09 0.000
S = 0.4678 R-Sq = 43.6% R-Sq(adj) = 42.7%
Analysis of Variance
Source DF SS MS F P
Regression 1 11.000 11.000 50.27 0.000
Residual Error 65 14.223 0.219
Total 66 25.223
Unusual Observations
Obs logPop C39 Fit StDev Fit Residual St
Resid
7 8.5 -4.0407 -5.0464 0.0901 1.0057
2.19R
22 8.1 -5.8950 -4.9257 0.1039 -0.9694
-2.13R
44 10.4 -6.5416 -5.5452 0.0572 -0.9964
-2.15R
50 13.0 -4.8267 -6.2372 0.1137 1.4104
3.11R
MTB > plot c40*c18 #residuals of logPBpro against logpop
Plot
- *
C40 -
-
- *
0.80+ *
- *** * *
- * * * *
- * ** * 2 * *
- * * 2* * * * *
0.00+ ** * * * * * *
- * **** * * * * *
- 2 * * * ** * *
- *
- * * * * * *
-0.80+ * *
- * *
-
+---------+---------+---------+---------+---------+------logPop
7.2 8.4 9.6 10.8 12.0 13.2
Just out of interest, IF we assume that the residual arises from ballot
errors, how many votes does it represent? A factor of exp(1.41) = 4.1
would suggest that 75% of the 3407
Buchanan votes were not intended, so that about 2500 were intended for
Gore.
Yrs aye,
Robert Dawson
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================