To voting-theory people... A new paper on voting methods just came out, with a rather different angle than usual (1. written by computer, not political, scientists; 2. oriented toward task of ranking web pages by search engines, i.e. 100000 candidates, 20 voters, versus usual political scenario of 10 candidates, 100000 voters; 3. rather different notions of "good quality"... want the top 10 web pages to have a good chance of containing the one you want and not "spam"...). It might be interesting to try to take their voting methods and put them into my computer program that compares different voting systems, and see what happens.
Cynthia Dwork Ravi Kumar Moni Naor D. Sivakumar: Rank Aggregation Methods for the Web, http://www10.org/cdrom/papers/577/ They do not give pseudocode for their algorithms... They have 3 main ideas (#1 and #2 are competing ideas, and they ultimately prefer #2): 1. use of "min-weight matching in a complete bipartite graph" to find the permutation "best approximating" the vote-permutations 2. construction of various Markov chains on the candidates, and the rank-ordering output by the voting system then corresponds to the stationary distribution of the Markov chain. 3. "local Kemenyization" to "improve" the rank-ordering output by any voting system. [See, e.g. their proposition 4 and their Markov chains MC1,MC2,MC3,MC4.] All these ideas are much more sophisticated algorithmically than the usual voting algorithms, but still are polynomial time. After experiments they conclude "Of all the methods... MC4 outperforms all others... [based on their human judgement of how the web pages "should" have been ranked] beats Borda by a huge margin... Local Kemenyization seems to improve [everything tried] by about 1-3%..." UnKemenyized MC4 seems to be the following: A. consider the following Markov chain. If at a candidate P, pick a candidate Q at random: if Q>P according to a majority of voters, go to Q, else stay at P. B. find the stationary-limit distribution of this Markov chain [this can be done by finding the Perron-Frobenius eigenvector of a certain CxC matrix if there are C candidates] and rank candidates according to their (decreasing) probabilities in this distribution. MC4 is rather similar to Condorcet's method, but intuitively seems like it might be better than Condorcet since it takes advantage of information that Condorcet ignores/discards? On the other hand, note that, like the "Copeland" system it discards some info that Condorcet uses -- namely, the victory-margins in the pairwise comparisons. That suggests it would be worse than Condorcet? Their MC1 and MC3 schemes do not discard victory margins but experimentally are worse than MC4 in the sense that they are "more vulnerable to spam." (They did not do the MC4 vs Condorcet experimental comparison, so one cannot tell.) Typically most candidates will have ZERO stationary probability and there will be a small intransitive cycle of candidates with nonzero, and these candidates often will have non-tied rankings... Here is the abstract of a talk Dwork gave on this: ------------------------------------------------------------------------- Topic: Rank Aggregation Revisited Speaker: Cynthia Dwork Date: 01/11/2001 Abstract: The rank aggregation problem is to combine many different rank orderings on the same set of candidates, or alternatives, in order to obtain a ``better'' ordering. Rank aggregation has been studied extensively in the context of social choice theory, where several ``voting paradoxes'' have been discovered. It also arises in sports (How to rank/compare players?), collaborative filtering, meta-search, and database middleware. A natural step toward aggregation was taken by Kemeny. Informally, given k input orderings, a Kemeny-Optimal aggregation is an ordering that minimizes the sum of ``bubble sort'' distances to the input orderings. Thus, intuitively, Kemeny optimal solutions produce best compromise orderings. However, finding a Kemeny optimal aggregation is NP-hard. We revisit rank aggregation with an eye toward reducing spam - -- strategic manipulation of web pages in order to achieve an ``undeservedly'' high ranking -- in meta-search. We note the virtues of Kemeny-Optimal aggregation in this context and develop a natural relaxation that preserves the spam-fighting capabilities of Kemeny-Optimality at vastly reduced cost. We show how to efficiently take *any* initial aggregation of the input orderings and produce its ``local Kemenization'' -- intuitively, a maximally consistent locally Kemeny-Optimal solution. This yields a new approach to rank aggregation: ``Given the input orderings, first construct any desirable initial aggregation and then output its local Kemenization.'' We propose the use of Markov chains for obtaining the initial aggregation, and suggest specific chains for this purpose. Joint work with Ravi Kumar, Moni Naor, and D. Sivakumar. ---------------------------------------------------------------
