Here is an update from the new 1000 game test using gungo at level 8 instead of 10.
Rank Name Elo + - games score oppo. draws 1 Gnugo-3.7.11 1800 34 30 2186 97% 1137 0% 2 Mogo_03 1507 48 56 186 16% 1800 0% 3 Mogo_02 1202 43 51 1000 3% 1800 0% 4 Mogo_01 1003 70 96 1000 1% 1800 0% The test, at this point, seems to indicate that gnugo at level 8 is stronger than at level 10 because mogo is not doing as well as in the previous test. It will be more meaningful when we get to levels close to gnugo's strength. - Don > As promised, to answer Rémi, I did a study with mogo vs Gnu at various > levels. There is NO self play involved, Gnugo-3.7.11 is the only > opponent for progressively higher rated version of Mogo. > > Here are the raw results so far: > > Rank Name Elo + - games score oppo. draws > 1 Mogo_10 2319 72 60 500 95% 1800 0% > 2 Mogo_11 2284 94 74 259 94% 1800 0% > 3 Mogo_09 2234 57 49 500 92% 1800 0% > 4 Mogo_08 2124 43 39 500 87% 1800 0% > 5 Mogo_07 2016 35 33 500 78% 1800 0% > 6 Mogo_06 1961 32 30 500 72% 1800 0% > 7 Mogo_05 1814 28 28 500 52% 1800 0% > 8 Gnugo-3.7.11 1800 13 13 5259 44% 1823 0% > 9 Mogo_04 1711 29 29 500 37% 1800 0% > 10 Mogo_03 1534 35 38 500 18% 1800 0% > 11 Mogo_02 1281 60 72 500 5% 1800 0% > 12 Mogo_01 1004 115 178 500 1% 1800 0% > > > The issue is whether self-play results distort the rating of programs. > In this case, we are only testing whether it distorts the ratings of > Mogo since no other programs were tested. > > In the following table, I played up to 500 games between Gnugo and Mogo > at various levels. The levels are the exact levels that correspond to > the big scalability study. In the middle column I listed the > ratings as computed by bayeselo in games against ONLY Gnugo and set the > default rating of Gnugo to 1800, just as in the study. > > Unfortunately, I used level 10 in the gnugo only games but in the big > study we use level 8. It's my understanding there is little difference > between these 2 but we can probably assume Mogo might be a little better > than indicated relative to the big scalability study. > > It looks like there indeed is a lot of distortion at the low end of the > scale. Mogo seems much stronger at low levels than the larger > scalability study indicated. > > At the higher levels, we also get a mismatch, where Mogo's rating > doesn't seem as high when playing only Gnugo. This is as Rémi > claims. > > One thing to note is that at higher levels it's more difficult to get an > accurate rating. Mogo_10 is winning 95% of it's games against Gnugo, > and an extra win or loss every few games can make a lot of difference. > However I am inclined to believe this is real since it seems to hold for > several upper levels. At level 7 it's only 42 ELO, but at levels > beyond this it's over 100 ELO. > > I've never doubted that there is some intransivity between programs, but > I am a little surprised that it is this much. Even if the comparison is > slightly unfair due to Mogo playing a stronger version of Gnugo in this > study, it's still seems like it must be at least 100 ELO. > > > vers vs Gnu Study > ---- ------ ----- > 01 1004 688 > 02 1281 1093 > 03 1534 1331 > 04 1711 1554 > 05 1814 1751 > 06 1961 1971 > 07 2016 2058 > 08 2124 2270 > 09 2234 2347 > 10 2319 2470 > > > My suggestion to improve this situation is to play a few thousands games > against a well rated Gnugo and set up mogo as a second anchor. > > - Don > > > > _______________________________________________ > computer-go mailing list > [email protected] > http://www.computer-go.org/mailman/listinfo/computer-go/ > > _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
