[gnugo-devel] twin endgame match

alain Baeckeroot Fri, 10 Mar 2006 15:25:29 -0800

Hi

Following Arend advice, gg378 and twin-378 had a 85 games endgame-match:
- twin 26 win (1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 5 7 10 14 15 21 25 28)
- GNU Go 14 win(-9 -3 -3 -3 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1)
- 45 unchanged
The sum is +135, the average on 85 games +1.6


_but_ when one looks at the attached plot of cumulative +PASS -FAIL versus 
game_status, the twin fails a lot of end-game tests (game_status>0.85). It is 
already a huge task to check big failures, but i feel too lazy to investigate 
this 40 tests and more than 50 regressions in endgame, (and i am a very bad 
yose player ;-) 

By construction, the twin "knows" exactly how gg378 evaluates the game, and 
the twin may steal a big point before gg378 plays it, but it is still 
gnugo-logic. So i wonder if this endgame match is significant or if it is 
just a systematic error.

In other words, a reliable endgame comparison should imply an other engine, 
good at endgame, and compare the results of both against the reference 
engine.

Am i right, or just paranoid ?
Is there such an engine available ?

- Alain

PS: the plot include all boardsizes, it is not so flat when separating them, 
but i have made too much clean-up, and erased the results, so ... i re run 
regression tests again :(

twin4-d1.5_cumul+P-F_vs_gstatus.png
Description: PNG image

_______________________________________________
gnugo-devel mailing list
gnugo-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/gnugo-devel

[gnugo-devel] twin endgame match

Reply via email to