Hi, 2009/4/6 Tom Steinberg <[email protected]>: > In the first instance I leaned away from pairwise because a) it > doesn't show you how good your prediction skills are compared with > other players
If we use an Elo-type system (i.e. chess ratings) then it can show you something like this. It would be able to show you the expected result of the face-off based on previous results. In fact it would probably be *more* interesting, as it could say "that's what we thought you'd say!" - I'd find than more amusing than knowing the average score given by others (which is always going to revert towards the mean over time, and become less and less interesting). > b) I thought it would take more plays than my spec to to > get 217k images usable ratings. You may have discarded the pairwise option because on a quick reflection you might imagine each possible pair must be played at once...? If we are assuming a normal distribution in outcomes of games, then only a tiny proportion of the possible pairings need be played as long as the pairings are randomly selected from the whole sample. > Can someone mathematical tell me how > many 'plays' of a pairwise game it would take to get any useful > results (according to any definition of useful you choose)? I am no > mathmo and might have got this quite wrong. I don't know about the definition of useful/usable. On what basis was it decided that 3 ratings on a 1-10 scale makes the rating useful? Out of interest, based on a simulation I just ran (which of course could have had holes in it, seeing as I'm not a statistician...): If you do N * 5 plays you will get a distribution of scores from -600 to 600 with each rating having a range of plus-or-minus 200 at a confidence interval of 90%. If you do N * 200 plays you will get a distribution of scores from -2000 to 2000 with each rating having a range of plus-or-minus 60 at a confidence interval of 90%. At least with the pairwise system we can statistically express our confidence in the relative ranking. With a scoring system, I have no idea how you would start to express this confidence. Maybe someone else does? > I don't think that either > my argument or the other side is especially strong when it comes to > confidently assessing which one will be played more, leaving aside the > quality of the outputs, or the number of plays required to get a > useful dataset. Umm.. how can it be relevant to leave aside the quality of outputs or usefulness of a dataset!? I wish I was a statistician or a mathematician, but it's fun playing at it. Seb _______________________________________________ Mailing list [email protected] Archive, settings, or unsubscribe: https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public
