In a Monte-Carlo program, the amount of information derived from one
playout is given by its enthalpy
-p(win).log_2(p(win)) - p(lose).log_2(p(lose)).
This has a maximum at p(win) = 0.5, and is 0 if p(win) is 0 or 1.
So, suppose your MC Go-playing program is doing its playouts, and has
found several moves which have all won more than 90% of the time. It
can do more playouts with these moves, but this is a poor way of getting
more information about which of them is best. If instead, it pretends
that it will have to give an extra 10 points of komi, maybe it finds
that these moves now win, on average, only 60% of the time. Now the
enthalpy of the playouts is greater, so it is gathering information
faster. The information is not as good, it is measuring the wrong
thing; but it is gathering it more than twice as fast, which should more
than compensate.
Similarly, suppose its best move has won less than 10% of the playouts.
It could resign, but let's say it is giving a handicap to a weaker
player. Instead of just doing more playouts, it can pretend that it
will be receiving extra komi. Again, the quality of the information per
playout then drops, but the quantity goes up, hopefully by more than
enough to compensate.
This seems like an argument for using dynamic komi, adjusted from time
to time during each game move.
Nick
--
Nick Wedd [email protected]
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go