In a Monte-Carlo program, the amount of information derived from one playout is given by its enthalpy
  -p(win).log_2(p(win)) - p(lose).log_2(p(lose)).
This has a maximum at p(win) = 0.5, and is 0 if p(win) is 0 or 1.

So, suppose your MC Go-playing program is doing its playouts, and has found several moves which have all won more than 90% of the time. It can do more playouts with these moves, but this is a poor way of getting more information about which of them is best. If instead, it pretends that it will have to give an extra 10 points of komi, maybe it finds that these moves now win, on average, only 60% of the time. Now the enthalpy of the playouts is greater, so it is gathering information faster. The information is not as good, it is measuring the wrong thing; but it is gathering it more than twice as fast, which should more than compensate.

Similarly, suppose its best move has won less than 10% of the playouts. It could resign, but let's say it is giving a handicap to a weaker player. Instead of just doing more playouts, it can pretend that it will be receiving extra komi. Again, the quality of the information per playout then drops, but the quantity goes up, hopefully by more than enough to compensate.

This seems like an argument for using dynamic komi, adjusted from time to time during each game move.

Nick
--
Nick Wedd    [email protected]
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to