When you measure win rates against players with a given rating, you measure both how well player strength predicts probability of winning, and how accurately the ratings reflect player strength. Sometimes the ratings are quite inaccurate. This causes win rates to regress towards 50%. If you can increase the accuracy of the ratings, this tendency to regress is reduced. So a win-rate->strength conversion that works for a stable setup with accurate ratings should not be expected to apply in a less stable setting with less certain ratings.
Some examples of rating inaccuracy in human play, even for players who have played a decent number of games: When I started out and was still improving rapidly (instead of not at all), my EGF rating lagged my real strength by about 3 stones for a year, because I only entered EGF rated tournaments every few months. This situation is hardly unusual. Michael was rated 1 dan when the 199X Danish Go Championship started, but he was really probably at least 4 dan strength, because he had prepared seriously for months, correctly believing that doing so might give him a shot at winning the first-prize free trip to Tokyo (paid for by the Japanese embassy) for the World Amateur Go Championship. John Doe is rated 1 dan, but who knows how he will play after having nothing to do with go for six months? For computers, you get different sources of uncertainty. Since the current policy on CGOS is that programs that have a changed a lot should get new usernames, you do not expect usernames to improve in strength much these days. But you do get weird upsets caused by bugs, crashes, or setup problems. Kartoffel's habit of losing on time to weak opponents and beating stronger ones was just the most recent prominent example. If you're measuring your program's performance in a private tournament, you try to eliminate the jokers so that real differences in strength are magnified (as compared to public games with many uncertainties). For instance, you would try to measure against a stable opponent, not one that has weird bugs. Over-fitting may well an even more important factor that makes win rate conversions uncertain, but that is a different topic. _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
