>> if you think about it, it simply *cannot* be accurate.
>
> Is that logic or is credulity speaking?

credulity, since i have no information about the specifics of how
these two groups are being measured other than to say that they're not
being measured in the same way.

to be clear, i'm talking about comparing these two kinds of rankings:

* humans playing against each other in real life in tournaments
organized by, say, FIDE.

* computers playing against one another and against people on a computer server.

two assumptions that i'm making here:

the first kind of ranking is what i assume people use when they talk
about famously (top few in the world) strong chess playing humans.
when someone mentions kramnik's ELO rating, i'm guessing they don't
mean kramnik as to be found on some chess server somewhere. the second
kind of ranking is what i assume is the only well-measured kind of
ranking available for computer players.

however, even if computers were allowed to play in FIDE-governed
tournaments and to achieve all of their ELO points that way, i still
don't think that their ELO ratings would accurately predict their
winrates against the top players if the top players weren't all
playing against them with the same frequency that they play against
humans. additionally, for the reasons i mentioned in my last posting,
i think that a very small pool of computer-only players can generate a
fairly skewed ELO ranking because it is measuring a different thing --
namely, how well-ordered the set of computer players is.

a final reason why we shouldn't expect ELO to be comparable between
the two groups is that when comparing two far-from-comparable groups
using ELO, it becomes very difficult to estimate winrates:

imagine if the only way to get a go ranking on a go server was for the
server to estimate the percentage of the time that you would win
against someone 20 stones stronger. it would assign you a default
minimum ranking and the player 20 stones stronger would slowly accrete
ELO from you and your ilk. something less dramatic but similar is
(presumably) occurring with computer players right now. if the winrate
is effectively extremely high against humans, the lowest-strength
computer player with that property will have an ELO higher than the
highest-ranked human it has played. if these programs are close to
well-ordered, and if their mutual winrates are high, then the ELO
calculation for each of them should rapidly push the top program very
far above the top human, even if the top human has never played the
top program. the ELO will get sucked from whomever is willing to play
them, but magnified because of the stratification of winrates that
(presumably) occurs.

it would be relatively easy to see to what extent this effect matters
if anyone knows the pairwise winrates of the top (unmodified) computer
players over a long enough time period.

s.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to