[Computer-go] naive thoughts on enthalpy and dynamic komi

Nick Wedd Fri, 04 Feb 2011 05:04:17 -0800

In a Monte-Carlo program, the amount of information derived from oneplayout is given by its enthalpy

  -p(win).log_2(p(win)) - p(lose).log_2(p(lose)).
This has a maximum at p(win) = 0.5, and is 0 if p(win) is 0 or 1.

So, suppose your MC Go-playing program is doing its playouts, and hasfound several moves which have all won more than 90% of the time. Itcan do more playouts with these moves, but this is a poor way of gettingmore information about which of them is best. If instead, it pretendsthat it will have to give an extra 10 points of komi, maybe it findsthat these moves now win, on average, only 60% of the time. Now theenthalpy of the playouts is greater, so it is gathering informationfaster. The information is not as good, it is measuring the wrongthing; but it is gathering it more than twice as fast, which should morethan compensate.

Similarly, suppose its best move has won less than 10% of the playouts.It could resign, but let's say it is giving a handicap to a weakerplayer. Instead of just doing more playouts, it can pretend that itwill be receiving extra komi. Again, the quality of the information perplayout then drops, but the quantity goes up, hopefully by more thanenough to compensate.

This seems like an argument for using dynamic komi, adjusted from timeto time during each game move.


Nick
--
Nick Wedd    [email protected]
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

[Computer-go] naive thoughts on enthalpy and dynamic komi

Reply via email to