Hi,

2009/4/6 Tom Steinberg <[email protected]>:
> In the first instance I leaned away from pairwise because a) it
> doesn't show you how good your prediction skills are compared with
> other players

If we use an Elo-type system (i.e. chess ratings) then it can show you
something like this.  It would be able to show you the expected result
of the face-off based on previous results.  In fact it would probably
be *more* interesting, as it could say "that's what we thought you'd
say!" - I'd find than more amusing than knowing the average score
given by others (which is always going to revert towards the mean over
time, and become less and less interesting).

> b) I thought it would take more plays than my spec to to
> get 217k images usable ratings.

You may have discarded the pairwise option because on a quick
reflection you might imagine each possible pair must be played at
once...?  If we are assuming a normal distribution in outcomes of
games, then only a tiny proportion of the possible pairings need be
played as long as the pairings are randomly selected from the whole
sample.

> Can someone mathematical tell me how
> many 'plays' of a pairwise game it would take to get any useful
> results (according to any definition of useful you choose)? I am no
> mathmo and might have got this quite wrong.

I don't know about the definition of useful/usable.  On what basis was
it decided that 3 ratings on a 1-10 scale makes the rating useful?

Out of interest, based on a simulation I just ran (which of course
could have had holes in it, seeing as I'm not a statistician...):

If you do N * 5 plays you will get a distribution of scores from -600
to 600 with each rating having a range of plus-or-minus 200 at a
confidence interval of 90%.
If you do N * 200 plays you will get a distribution of scores from
-2000 to 2000 with each rating having a range of plus-or-minus 60 at a
confidence interval of 90%.

At least with the pairwise system we can statistically express our
confidence in the relative ranking.  With a scoring system, I have no
idea how you would start to express this confidence.  Maybe someone
else does?

> I don't think that either
> my argument or the other side is especially strong when it comes to
> confidently assessing which one will be played more, leaving aside the
> quality of the outputs, or the number of plays required to get a
> useful dataset.

Umm.. how can it be relevant to leave aside the quality of outputs or
usefulness of a dataset!?

I wish I was a statistician or a mathematician, but it's fun playing at it.

Seb

_______________________________________________
Mailing list [email protected]
Archive, settings, or unsubscribe:
https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public

Reply via email to