As promised, to answer Rémi, I did a study with mogo vs Gnu at various levels. There is NO self play involved, Gnugo-3.7.11 is the only opponent for progressively higher rated version of Mogo.
Here are the raw results so far: Rank Name Elo + - games score oppo. draws 1 Mogo_10 2319 72 60 500 95% 1800 0% 2 Mogo_11 2284 94 74 259 94% 1800 0% 3 Mogo_09 2234 57 49 500 92% 1800 0% 4 Mogo_08 2124 43 39 500 87% 1800 0% 5 Mogo_07 2016 35 33 500 78% 1800 0% 6 Mogo_06 1961 32 30 500 72% 1800 0% 7 Mogo_05 1814 28 28 500 52% 1800 0% 8 Gnugo-3.7.11 1800 13 13 5259 44% 1823 0% 9 Mogo_04 1711 29 29 500 37% 1800 0% 10 Mogo_03 1534 35 38 500 18% 1800 0% 11 Mogo_02 1281 60 72 500 5% 1800 0% 12 Mogo_01 1004 115 178 500 1% 1800 0% The issue is whether self-play results distort the rating of programs. In this case, we are only testing whether it distorts the ratings of Mogo since no other programs were tested. In the following table, I played up to 500 games between Gnugo and Mogo at various levels. The levels are the exact levels that correspond to the big scalability study. In the middle column I listed the ratings as computed by bayeselo in games against ONLY Gnugo and set the default rating of Gnugo to 1800, just as in the study. Unfortunately, I used level 10 in the gnugo only games but in the big study we use level 8. It's my understanding there is little difference between these 2 but we can probably assume Mogo might be a little better than indicated relative to the big scalability study. It looks like there indeed is a lot of distortion at the low end of the scale. Mogo seems much stronger at low levels than the larger scalability study indicated. At the higher levels, we also get a mismatch, where Mogo's rating doesn't seem as high when playing only Gnugo. This is as Rémi claims. One thing to note is that at higher levels it's more difficult to get an accurate rating. Mogo_10 is winning 95% of it's games against Gnugo, and an extra win or loss every few games can make a lot of difference. However I am inclined to believe this is real since it seems to hold for several upper levels. At level 7 it's only 42 ELO, but at levels beyond this it's over 100 ELO. I've never doubted that there is some intransivity between programs, but I am a little surprised that it is this much. Even if the comparison is slightly unfair due to Mogo playing a stronger version of Gnugo in this study, it's still seems like it must be at least 100 ELO. vers vs Gnu Study ---- ------ ----- 01 1004 688 02 1281 1093 03 1534 1331 04 1711 1554 05 1814 1751 06 1961 1971 07 2016 2058 08 2124 2270 09 2234 2347 10 2319 2470 My suggestion to improve this situation is to play a few thousands games against a well rated Gnugo and set up mogo as a second anchor. - Don _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
