This year, circumstances dictated that the US Presidential race boiled
down to the results in Florida. That's where the decisive Electoral
Votes were and that's where the outcome was most uncertain. Since then,
Bush supporters have been insisting that all the votes have been counted
and Gore supporters have insisted that all the votes were not counted.
The problem is that they are both right, given the ambiguity of the verb
"to count". For the purpose of clarity, I will discuss whether the votes
were *properly categorized* rather than whether they were "counted".

On Election day, there were four categories of votes in Florida as far
we are concerned: "Bush", "Gore", "Third Party" and "Rejected by
machine". The very fact that the difference between "Bush" and "Gore"
(1,725) was less than one percent of the "Rejected by machine" votes
(180,299) alone justified a recount, even if one wasn't automatically
required by Florida law. Florida law also allows for requested
handcounts and determination of voter intent by human beings, not
counting machines. Never mind that Bush and his supporters tried to
dismiss all handcounts as worse than machine counts (unless they take
place in New Mexico) and that Gore and his supporters failed to call for
a state-wide handcount before the deadline, thereby laying themselves
open to charges of "data mining".

We've all heard about the ongoing vote recount in Florida now being done
by various media groups rather than any political parties, and some have
dismissed it as a waste of time, or worse. To be fair, some of these
critics may feel they are being entirely objective in their assessment
and to call them partisan is unwarranted until proven. Critics like
Stephen Gould feel that the result is basically an ineradicable
"statistical tie", thanks to an ineradicable "margin of error". This is
a fallacy.

Most of us feel that we know what "margin of error" means but to make
sure we're all on the same page, let's review.

"Margin of error" is a term out of survey polling that refers to the
confidence we have in the results of a given survey. In general, the
margin of error corresponds to the 95% confidence interval. For example,
if a pre-election survey indicates that 49% of "likely voters" want
Bush, 47% want Gore and 4% want neither, but the margin of error is
+/-3%, this is a "statistical tie" because the difference between Bush
and Gore falls within the margin of error. If the difference between
Bush and Gore was 6 or more percentage points, we could say (given a
95% confidence interval) that in only one such survey out of twenty
would such a difference occur purely by chance. Only by taking a larger
sample (more expensive, more time-consuming) can we reduce the margin of
error further.

The waste-of-time argument is based in large part on the assumption of
an ineradicable margin of error, which I will demonstrate is a false
assumption for vote recounts. In pre-election surveys (and for that
matter, most other surveys, e.g., "Is Coke better than Pepsi?"), we are
looking at a small sample compared to a huge population. The population
is typically 20 or more times greater than the sample. In those cases,
we can use the infinite-population model (i.e., a binomial distribution)
which is easy to calculate and a good approximation. But when the sample
is a significant part of the population, you simply can't ignore the fpc
(finite population correction).

Also known as the fpcf (finite-population correction factor), the fpc is
something that the better surveyors know about but seldom discuss and
probably less than 2% of the talking heads discussing "margin of error"
are even remotely aware of.

The formula for the finite population correction is

fpc  =  [(N - n)/(N - 1)]^0.5, where N  =  population size and n  =
sample size.

As you can see, whenever N >> n ("N is much greater than n"), then fpc
is essentially one and can be ignored. But the closer n gets to N, the
closer the fpc gets to zero and it becomes a significant factor in your
margin-of-error calculation.

For example, according to the AP, on Election Day, the accepted vote
total in Florida was 5,958,268, and the grand total of ballots cast was
6,138,567. This gives an fpc of 0.1714 or roughly 6/35. Naturally the
absentee ballots that came later raised both the accepted total and the
grand total but you can see how neglecting the fpc leads to significant
error in any post-election calculation of the margin of error.

The statistic we wish to measure is

(Pg - Pb)  +/-  z*Sqrt[((Pg(1 - Pg) + Pb(1 - Pb))/n]*fpc,  where

Pg  =  proportion of votes for Gore in the sample,
Pb  =  proportion of votes for Bush in the sample,
n  =  sample size  =  number of *properly categorized* votes (including
absentee ballots),
N  =  population size  =  total number of ballots cast in Florida
(including absentee ballots),
z  =  confidence coefficient ( = 1.96 for 95% confidence level),
fpc  =  finite population correction  =  Sqrt[(N - n)/(N - 1)].

The argument between the Bush supporters and the Gore supporters was
essentially whether "Rejected by machine" is a proper category, i.e.,
whether or not "Rejected by machine" should be further broken down into
"Bush", "Gore", "Third Party" and "Invalids", all as determined by human
beings. Only when n becomes N does the margin of error go to zero. Then
and only then can we say that the vote has been fully counted and the
TRUE Presidential contest winner in Florida has been determined.

For further insight into the fpc, see: hypergeometric distribution,
standard deviation of.

Jake

P.S.  Canada handcounted thirteen million votes in 2.5 hours. We look
like idiots.





=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to