On Wed, Feb 07, 2007 at 12:06:40PM +0200, Tapani Raiko wrote:
> Let my try again using the handicap example. Let's say MC player is given 
> a huge handicap. In the simulations, it is winning all of its games, so 
> there is no information helping to select the next move. 

This situation happens in normal games too, once one player is so much
ahead that it wins almost no matter what. It leads into really
stupid-looking endgames, where live groups are allowed to die, and dead
ones are allowed to be rescued.

All this could be avoided by a simple rule: Instead of using +1 and -1
as the results, use +1000 and -1000, and add the final score to this.

The purpose of the large constant (1000) is to make sure that it prefers
any win to any loss (so that large_win + small_loss < small_win +
small_win). One could even add another term in the result, favouring
games that end early (for the winner) or postpone them (for the looser),
in hope of allowing the opponent more chances to make mistakes.

As far as I can see, this ought to fit straight in to any MC or UCT
program. It may not improve the winning chances, but it sure should make
the programs play look more reasonable.


Just my humble idea. Feel free to shoot down (with serious arguments),
and/or use where ever you like. I would like to hear if this makes any
practical difference, if anyone tries.

   - Heikki



-- 
Heikki Levanto   "In Murphy We Turst"     heikki (at) lsd (dot) dk

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to