[computer-go] Is Rémi correct?

Don Dailey Tue, 05 Feb 2008 11:44:25 -0800

As promised,  to answer Rémi, I did a study with mogo vs Gnu at various
levels.   There is NO self play involved, Gnugo-3.7.11 is the only
opponent for progressively higher rated version of Mogo.


Here are the raw results so far:

Rank Name           Elo    +    - games score oppo. draws
   1 Mogo_10       2319   72   60   500   95%  1800    0%
   2 Mogo_11       2284   94   74   259   94%  1800    0%
   3 Mogo_09       2234   57   49   500   92%  1800    0%
   4 Mogo_08       2124   43   39   500   87%  1800    0%
   5 Mogo_07       2016   35   33   500   78%  1800    0%
   6 Mogo_06       1961   32   30   500   72%  1800    0%
   7 Mogo_05       1814   28   28   500   52%  1800    0%
   8 Gnugo-3.7.11  1800   13   13  5259   44%  1823    0%
   9 Mogo_04       1711   29   29   500   37%  1800    0%
  10 Mogo_03       1534   35   38   500   18%  1800    0%
  11 Mogo_02       1281   60   72   500    5%  1800    0%
  12 Mogo_01       1004  115  178   500    1%  1800    0%


The issue is whether self-play results distort the rating of programs. 
In this case, we are only testing whether it distorts the ratings of
Mogo since no other programs were tested.

In the following table,  I played up to 500 games between Gnugo and Mogo
at various levels.   The levels are the exact levels that correspond to
the big scalability study.      In the middle column I listed the
ratings as computed by bayeselo in games against  ONLY Gnugo and set the
default rating of Gnugo to 1800, just as in the study.

Unfortunately,  I used level 10 in the gnugo only games but in the big
study we use level 8.   It's my understanding there is little difference
between these 2 but we can probably assume Mogo might be a little better
than indicated relative to the big scalability study. 

It looks like there indeed is a lot of distortion at the low end of the
scale.  Mogo seems much stronger at low levels than the larger
scalability study indicated. 

At the higher levels,  we also get a mismatch,  where Mogo's rating
doesn't seem as high when playing only Gnugo.   This is as Rémi
claims.      

One thing to note is that at higher levels it's more difficult to get an
accurate rating.  Mogo_10 is winning 95% of it's games against Gnugo, 
and an extra win or loss every few games can make a lot of difference.  
However I am inclined to believe this is real since it seems to hold for
several upper levels.   At level 7 it's only 42 ELO, but at levels
beyond this it's over 100 ELO.  

I've never doubted that there is some intransivity between programs, but
I am a little surprised that it is this much.  Even if the comparison is
slightly unfair due to Mogo playing a stronger version of Gnugo in this
study,  it's still seems like it must be at least 100 ELO.


vers  vs Gnu  Study
----  ------  -----
  01    1004    688
  02    1281   1093
  03    1534   1331
  04    1711   1554
  05    1814   1751
  06    1961   1971
  07    2016   2058
  08    2124   2270
  09    2234   2347
  10    2319   2470


My suggestion to improve this situation is to play a few thousands games
against a well rated Gnugo and set up mogo as a second anchor.  

- Don



_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Is Rémi correct?

Reply via email to