This algorithm, at first read, seems that it could still be NP-hard if all candidates were part of a (three-candidate???) cycle with all other candidates. Which is, of course, a case which would hardly ever arise in real life; but one which must be considered in giving a worst-case execution time order for the algorithm.
Jameson 2011/12/21 Richard Fobes <[email protected]> > As previously promised, I am revealing how Condorcet-Kemeny calculations > can be done fast. > > How fast? Calculating a "complex" case with 12 choices is so fast that I > haven't bothered to time the calculations. As an estimate, it is on the > order of a couple of seconds or less. > > The estimated time for calculating results for 50 choices (!) is less than > a minute -- which is far less than a "lifetime". > > The algorithm is described in the documentation that accompanies the > just-announced VoteFair ranking software. > > The full explanation of the algorithm is described on these three web > pages: > > > http://www.votefair.org/**calculation_details_**popularity.html<http://www.votefair.org/calculation_details_popularity.html> > > > http://www.votefair.org/**insertion_sort_ranking.html<http://www.votefair.org/insertion_sort_ranking.html> > > > http://www.votefair.org/**choice_specific_pairwise_**score.html<http://www.votefair.org/choice_specific_pairwise_score.html> > > For convenience, below is the description for the "insertion-sort" part of > the algorithm, which is the lowest-level part of the algorithm. It is > followed by a brief description of the full algorithm (excluding the > initial estimation algorithm). > > As a reminder, the Condorcet-Kemeny method calculates a full ranking of > all choices; it does not just find the single winner. > > Also consider that the Condorcet-Kemeny method does not specify how to > resolve sequence-score ties. The VoteFair ranking software does resolve > such sequence-score ties by identifying which choices are tied at which > ranking levels. > > Richard Fobes > > > -------- Insertion-sort part of the algorithm --------- > > This explanation clarifies how the > well-known insertion-sort algorithm is applied > to VoteFair popularity ranking in a way that > greatly reduces the number of calculations > needed to find maximum sequence scores. > (This method is just part of the full > algorithm, which is explained [elsewhere].) > > Consider an example in which there are five > choices named A, B, C, D, and E, with a final > sort order that matches this alphabetical > order. > > Notation: The notation A>B refers to how > many voters pairwise prefer choice A over > choice B, and the notation B>A refers to how > many voters pairwise prefer choice B over > choice A. This notation always uses the > "greater-than" symbol ">", and never uses > the "less-than" symbol "<". > > At an intermediate stage in this sorting > example, suppose that the choices A, C, and E > have been sorted -- into this correct > order -- and choice B is about to be > sorted, and choice D remains unsorted. > The pairwise counts for this arrangement are > shown below. The asterisks show the > separation between sorted choices and unsorted > choices. > > | | | | | | > | A | C | E | B | D | > | | | | | | > -----***************************-------+-------+ > * \ | | * | | > A * \ | A>C | A>E * A>B | A>D | > * \ | | * | | > -----+-------+-------+-------+**-------+-------+ > * | \ | * | | > C * C>A | \ | C>E * C>B | C>D | > * | \ | * | | > -----+-------+-------+-------+**-------+-------+ > * | | \ * | | > E * E>A | E>C | \ * E>B | E>D | > * | | \ * | | > -----***************************-------+-------+ > | | | | \ | | > B | B>A | B>C | B>E | \ | B>D | > | | | | \ | | > -----+-------+-------+-------+**-------+-------+ > | | | | | \ | > D | D>A | D>C | D>E | D>B | \ | > | | | | | \ | > -----+-------+-------+-------+**-------+-------+ > > The diagonal line passes through empty cells > -- that would otherwise represent a > choice's comparison with itself, such as > A>A. > > The diagonal line also is the border between > the upper-right triangular area and the > lower-left triangular area. The sequence > score for the current sequence is the sum of > all the pairwise counts in the upper-right > triangular area (currently A>C + A>E + > A>B + A>D + C>E + C>B + C>D + > E>B + E>D + B>D). > > The goal of these calculations is to find > the maximum sequence score, which means the > goal is to change the sequence so that the > largest pairwise counts move into the > upper-right triangular area, leaving the > smallest pairwise counts in the lower-left > triangular area. > > The first step in sorting choice B is to > consider the possibility of moving it to the > left of choice E, which would form the sequence > A, C, B, E. Here is the pairwise-count > matrix for this sequence. The asterisks > now include choice B because this is a possible > sort order. > > | | | | | | > | A | C | B | E | D | > | | | | | | > -----***********************************-------+ > * \ | | | * | > A * \ | A>C | A>B | A>E * A>D | > * \ | | | * | > -----*-------+-------+-------+**-------*-------+ > * | \ | | * | > C * C>A | \ | C>B | C>E * C>D | > * | \ | | * | > -----*-------+-------+-------+**-------*-------+ > * | | \ | * | > B * B>A | B>C | \ | B>E * B>D | > * | | \ | --- * | > -----*-------+-------+-------+**-------*-------+ > * | | | \ * | > E * E>A | E>C | E>B | \ * E>D | > * | | --- | \ * | > -----***********************************-------+ > | | | | | \ | > D | D>A | D>C | D>B | D>E | \ | > | | | | | \ | > -----+-------+-------+-------+**-------+-------+ > > The only pairwise counts that crossed the > diagonal line are the (underlined) counts > B>E and E>B, which swapped places. > All the other pairwise counts that move do not > cross the diagonal line; they stay on the same > side of the diagonal line. > > As a result, the score for this sequence, > compared to the score for the previous > sequence, increases (or decreases if negative) > by the amount B>E minus E>B. In > this case (assuming there are no complications > that are explained later) the sequence score > has increased because in the final > (alphabetical) sort order, choice B appears > before choice E. > > The next step in sorting choice B is to > consider the possibility of moving it to the > left of choice C, which would form the sequence > A, B, C, E. Here is the pairwise-count > matrix for this sequence. > > | | | | | | > | A | B | C | E | D | > | | | | | | > -----***********************************-------+ > * \ | | | * | > A * \ | A>B | A>C | A>E * A>D | > * \ | | | * | > -----*-------+-------+-------+**-------*-------+ > * | \ | | * | > B * B>A | \ | B>C | B>E * B>D | > * | \ | --- | * | > -----*-------+-------+-------+**-------*-------+ > * | | \ | * | > C * C>A | C>B | \ | C>E * C>D | > * | --- | \ | * | > -----*-------+-------+-------+**-------*-------+ > * | | | \ * | > E * E>A | E>B | E>C | \ * E>D | > * | | | \ * | > -----***********************************-------+ > | | | | | \ | > D | D>A | D>B | D>C | D>E | \ | > | | | | | \ | > -----+-------+-------+-------+**-------+-------+ > > The only pairwise counts that crossed the > diagonal line are the (underlined) counts > B>C and C>B, which swapped places. > The other pairwise counts that moved remained > on the same side of the diagonal line. > > The score for this sequence increases (or > decreases if negative) by the amount B>C > minus C>B. In this case the sequence > score has increased because (in the final > alphabetical order) choice B appears before > choice C. > > The final step in sorting choice B is to > consider the possibility of moving it to the > left of choice A, which would form the sequence > B, A, C, E. Here is the matrix for this > sequence. > > | | | | | | > | B | A | C | E | D | > | | | | | | > -----***********************************-------+ > * \ | | | * | > B * \ | B>A | B>C | B>E * B>D | > * \ | --- | | * | > -----*-------+-------+-------+**-------*-------+ > * | \ | | * | > A * A>B | \ | A>C | A>E * A>D | > * --- | \ | | * | > -----*-------+-------+-------+**-------*-------+ > * | | \ | * | > C * C>B | C>A | \ | C>E * C>D | > * | | \ | * | > -----*-------+-------+-------+**-------*-------+ > * | | | \ * | > E * E>B | E>A | E>C | \ * E>D | > * | | | \ * | > -----***********************************-------+ > | | | | | \ | > D | D>B | D>A | D>C | D>E | \ | > | | | | | \ | > -----+-------+-------+-------+**-------+-------+ > > The only pairwise counts that crossed the > diagonal line are the (underlined) counts > B>A and A>B, which swapped places. > The other pairwise counts that moved remained > on the same side of the diagonal line. > > The score for this sequence increases (or > decreases if negative) by the amount B>A > minus A>B. In this case the sequence > score has decreased because (in the final > alphabetical order) choice B appears after, not > before, choice A. > > At this point choice B has been tested at > each position within the sorted portion. > The maximum sequence score (for the sorted > portion) occurred when it was between choices A > and C. As a result, choice B will be > moved to the position between choices A and > C. > > Notice that the full sequence score did not > need to be calculated in order to find this > "local" maximum. These calculations only > need to keep track of increases and decreases > that occur as the being-sorted choice swaps > places with successive already-sorted > choices. > > The pairwise-count matrix with choice B in > the second sort-order position (between A and > C) is shown below. Now choice D is the > only unsorted choice. > > | | | | | | > | A | B | C | E | D | > | | | | | | > -----***********************************-------+ > * \ | | | * | > A * \ | A>B | A>C | A>E * A>D | > * \ | | | * | > -----*-------+-------+-------+**-------*-------+ > * | \ | | * | > B * B>A | \ | B>C | B>E * B>D | > * | \ | | * | > -----*-------+-------+-------+**-------*-------+ > * | | \ | * | > C * C>A | C>B | \ | C>E * C>D | > * | | \ | * | > -----*-------+-------+-------+**-------*-------+ > * | | | \ * | > E * E>A | E>B | E>C | \ * E>D | > * | | | \ * | > -----***********************************-------+ > | | | | | \ | > D | D>A | D>B | D>C | D>E | \ | > | | | | | \ | > -----+-------+-------+-------+**-------+-------+ > > Choice D would be sorted in the same > way. Of course the maximum sequence score > would occur when choice D is between choices C > and E, so D is moved there. > > | | | | | | > | A | B | C | D | E | > | | | | | | > -----******************************************* > * \ | | | | * > A * \ | A>B | A>C | A>D | A>E * > * \ | | | | * > -----*-------+-------+-------+**-------+-------* > * | \ | | | * > B * B>A | \ | B>C | B>D | B>E * > * | \ | | | * > -----*-------+-------+-------+**-------+-------* > * | | \ | | * > C * C>A | C>B | \ | C>D | C>E * > * | | \ | | * > -----*-------+-------+-------+**-------+-------* > * | | | \ | * > D * D>A | D>B | D>C | \ | D>E * > * | | | \ | * > -----*-------+-------+-------+**-------+-------* > * | | | | \ * > E * E>A | E>B | E>C | E>D | \ * > * | | | | \ * > -----******************************************* > > Now there are no more choices to sort, so > the resulting sequence is A, B, C, D, E. > In this sequence the full sequence score > -- which equals A>B + A>C + A>D + > A>E + B>C + B>D + B>E + C>D + > C>E + D>E -- is likely to be the > highest possible sequence score. > > Additional calculations, as described below, > are needed because in rare cases it is possible > that moving two or more choices at the same > time could produce a higher sequence > score. This concept is analogous to > climbing a mountain in foggy conditions by > always heading in the locally higher direction > and ending up at the top of a peak and then, > when the fog clears, seeing a higher peak. > > ------- Full calculation method for VoteFair popularity ranking ------- > > This is a description of the full algorithm > used to calculate VoteFair popularity ranking > results. > > The algorithm begins by calculating the > Choice-Specific Pairwise-Score ranking. > This pre-sort is a required part of the > process. Without it, some unusual cases > can cause the calculations to fail to find the > sequence with the highest score. This > pre-sort is analogous to starting a search for > the highest mountain peak within a mountain > range instead of starting the search within a > large valley. > > The next step is to apply the insertion-sort > method as described in the section above, > including starting at the left/highest end. > > To ensure that all possible moves of each > choice are considered, the insertion-sort > method is done in both directions. > Sorting in both directions means that in some > sorting passes sorting moves choices to the > left, as explained in the above example. > In other sorting passes sorting starts by > considering the right-most choice as the first > sorted choice, and choices move to the right, > into the sorted portion. This convention > ensures movement for choices that need to move > right, instead of left, in order to cause an > increase in the score. > > Complications can arise when there is > "circular ambiguity", so additional steps are > used. The most common cases of circular > ambiguity involve several choices that are tied > for the same sort-order position. > > A key part of dealing with circular > ambiguity is to follow this convention: > whenever a choice can produce the same, > highest, sequence score at more than one > position, the choice is moved to the farthest > of those highest-sequence-score positions. > > Another part of dealing with these > complications is to sort the sequence multiple > times. > > During the second sorting pass, if there is > no circular ambiguity, the sequence of the > choices in the pairwise matrix remains the > same. This lack of movement (when there > is no circular ambiguity) occurs because the > sorted and unsorted portions are > adjacent. Specifically, each choice to be > sorted is already at the top (for left-movement > sorting) or bottom (for right-movement sorting) > of the "unsorted" portion, and it is being > moved to the bottom (for left-movement sorting) > or top (for right-movement sorting) of > the "sorted" portion. In such cases > the only thing that moves is the boundary > between the sorted choices and unsorted > choices. > > However, in cases that involve circular > ambiguity, the positions of some choices will > change during the second and later sorting > passes. This happens because the > convention (as explained above) is to move each > choice as far as it will go, within the limits > of maximizing the sequence score. > > During the sorting passes the highest > sort-order (sequence) position of each choice > is tracked, and the lowest sort-order position > of each choice is tracked. These highest > and lowest positions are reset (to current > positions) whenever the sequence score > increases to a higher score. At the end > of the sorting process the highest and lowest > positions reveal which choices are tied at the > same popularity ranking level. > > Using the insertion-sort example, if choices > B, C, and D can be in any order and still > produce the same highest sequence score, then > each of these choices would move to the left of > the other two each time it is sorted, and each > of these choices would have the same > highest-ranked position of second place, and > each would have the same lowest-ranked position > of fourth place. Because these three choices > have the same highest and lowest positions, > they are easily identified as tied (at the same > popularity ranking). > > More complex cases of circular ambiguity can > occur. To deal with these cases, and to > ensure the correctness of the "winner" (the > most popular choice), the sorting process is > repeated for the top half (plus one) of the > highest-ranked choices, and this sub-set > sorting is repeated until there are just three > choices. For example, if there are 12 > choices, the sorting process is done for 12 > choices, then the top 7 choices, then the top 4 > choices, and finally the top 3 choices. > Then the highest-ranked choice (or the choices > that are tied at the top) is kept at the > highest rank while the other choices are sorted > a final time. (If, instead, the > least-popular choice is the most important one > to identify correctly, the data supplied to > this algorithm can be inverted according to > preference levels, and then the calculated > ranking can be reversed.) > > As a clarification, the extra sub-set > sorting is done only if more than one sequence > has the same highest sequence score. This > point is significant if the distinction between > VoteFair popularity ranking and the > Condorcet-Kemeny method is relevant. > Specifically, the Condorcet-Kemeny method does > not indicate how such "tied" sequence scores > should be resolved, whereas VoteFair popularity > ranking resolves such "tied" sequence scores as > part of its calculation process. > > After all the sorting has been done, the > highest and lowest ranking levels are used to > determine the results. For each choice > its highest and lowest ranking levels > are added together (which equals twice their > average) and then multiplied times a > constant. The constant equals 10 times > the number of choices minus one. These > numbers are converted to integers, and then > these "averaged scaled integerized" values are > used as the non-normalized ranking > levels. Two or more choices are ranked at > the same level if they have the same > "averaged-scaled-integerized" ranking > values. > > The final calculation step is to normalize > the "averaged-scaled-integerized" ranking > levels so that the normalized ranking levels > are consecutive, namely 1, 2, 3, etc. (so that > no ranking levels are skipped). > > The result is a ranking that identifies > which choice is first-most popular, which > choice is second-most popular, and so on down > to which choice is least popular. Ties > can occur at any level. > > ------ Calculation time ------ > > The full algorithm used to calculate > VoteFair popularity ranking results has a > calculation time that is less than or equal to > the following polynomial function: > > T = A + ( B * N ) + ( C * ( N * N ) ) > > where T is the calculation time, N is the > number of choices, and A and B and C are > constants. (In mathematical notation, N * > N would be written as N squared.) This > function includes the calculation time required > for the Choice-Specific Pairwise-Score (CSPS) > pre-sort calculations. > > This time contrasts with the slow execution > times of the "NP-hard" approach, in which > every sequence score is calculated in order to > find the sequence with the highest score. > If every sequence score were calculated (from > scratch), the calculation time would be > proportional to: > > N! * N * N > > where N is the number of choices, N! is N > factorial (2 * 3 * 4 * ... * N), and N * N > equals N squared. Note that N factorial > equals the number of possible sequences, and N > squared times one-half approximately equals the > number of pairwise counts that are added to > calculate each sequence score. > > This clarification about calculation time is > included because there is an academically > common -- yet mistaken -- belief that > calculating the "Condorcet-Kemeny method" is > "NP-hard" and cannot be calculated in a time > that is proportional to a polynomial function > of N (the number of choices). > > (c) Copyright 2011 Richard Fobes at VoteFair.org > > (This description copied from VoteFair.org > with permission.) > > ---- > Election-Methods mailing list - see http://electorama.com/em for list info >
---- Election-Methods mailing list - see http://electorama.com/em for list info
