Eric,

Yes, I agree with everything you said, well put.   

I believe computer programs are much more stable than human players
except for the forfeit problem you mentioned.   Assuming a program
doesn't forfeit in stupid ways, they NEVER have bad days,  wake up on
the wrong side of the bed,  get in a fight with their spouse, get
inspired to play well on a particular day or depressed on another day.
Also, even their errors are consistent in some sense, typical for their
particular style and level whereas humans can possibly make errors far
below their expected level, such as cases where strong grandmasters miss
a mate on the move.   Rare but possible. 

- Don


On Wed, 2008-08-27 at 22:03 -0400, Eric Boesch wrote:
> When you measure win rates against players with a given rating, you
> measure both how well player strength predicts probability of winning,
> and how accurately the ratings reflect player strength. Sometimes the
> ratings are quite inaccurate. This causes win rates to regress towards
> 50%. If you can increase the accuracy of the ratings, this tendency to
> regress is reduced. So a win-rate->strength conversion that works for
> a stable setup with accurate ratings should not be expected to apply
> in a less stable setting with less certain ratings.
> 
> Some examples of rating inaccuracy in human play, even for players who
> have played a decent number of games:
> 
> When I started out and was still improving rapidly (instead of not at
> all), my EGF rating lagged my real strength by about 3 stones for a
> year, because I only entered EGF rated tournaments every few months.
> This situation is hardly unusual.
> 
> Michael was rated 1 dan when the 199X Danish Go Championship started,
> but he was really probably at least 4 dan strength, because he had
> prepared seriously for months, correctly believing that doing so might
> give him a shot at winning the first-prize free trip to Tokyo (paid
> for by the Japanese embassy) for the World Amateur Go Championship.
> 
> John Doe is rated 1 dan, but who knows how he will play after having
> nothing to do with go for six months?
> 
> For computers, you get different sources of uncertainty. Since the
> current policy on CGOS is that programs that have a changed a lot
> should get new usernames, you do not expect usernames to improve in
> strength much these days. But you do get weird upsets caused by bugs,
> crashes, or setup problems. Kartoffel's habit of losing on time to
> weak opponents and beating stronger ones was just the most recent
> prominent example.
> 
> If you're measuring your program's performance in a private
> tournament, you try to eliminate the jokers so that real differences
> in strength are magnified (as compared to public games with many
> uncertainties). For instance, you would try to measure against a
> stable opponent, not one that has weird bugs.
> 
> Over-fitting may well an even more important factor that makes win
> rate conversions uncertain, but that is a different topic.
> _______________________________________________
> computer-go mailing list
> computer-go@computer-go.org
> http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to