When you measure win rates against players with a given rating, you
measure both how well player strength predicts probability of winning,
and how accurately the ratings reflect player strength. Sometimes the
ratings are quite inaccurate. This causes win rates to regress towards
50%. If you can increase the accuracy of the ratings, this tendency to
regress is reduced. So a win-rate->strength conversion that works for
a stable setup with accurate ratings should not be expected to apply
in a less stable setting with less certain ratings.

Some examples of rating inaccuracy in human play, even for players who
have played a decent number of games:

When I started out and was still improving rapidly (instead of not at
all), my EGF rating lagged my real strength by about 3 stones for a
year, because I only entered EGF rated tournaments every few months.
This situation is hardly unusual.

Michael was rated 1 dan when the 199X Danish Go Championship started,
but he was really probably at least 4 dan strength, because he had
prepared seriously for months, correctly believing that doing so might
give him a shot at winning the first-prize free trip to Tokyo (paid
for by the Japanese embassy) for the World Amateur Go Championship.

John Doe is rated 1 dan, but who knows how he will play after having
nothing to do with go for six months?

For computers, you get different sources of uncertainty. Since the
current policy on CGOS is that programs that have a changed a lot
should get new usernames, you do not expect usernames to improve in
strength much these days. But you do get weird upsets caused by bugs,
crashes, or setup problems. Kartoffel's habit of losing on time to
weak opponents and beating stronger ones was just the most recent
prominent example.

If you're measuring your program's performance in a private
tournament, you try to eliminate the jokers so that real differences
in strength are magnified (as compared to public games with many
uncertainties). For instance, you would try to measure against a
stable opponent, not one that has weird bugs.

Over-fitting may well an even more important factor that makes win
rate conversions uncertain, but that is a different topic.
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to