The crux is "it is gathering it more than twice as fast, which should
more than compensate", which is debatable.  Infinitely more
information may not be enough if you are measuring the wrong thing.


On Fri, Feb 4, 2011 at 8:02 AM, Nick Wedd <[email protected]> wrote:
> In a Monte-Carlo program, the amount of information derived from one playout
> is given by its enthalpy
>  -p(win).log_2(p(win)) - p(lose).log_2(p(lose)).
> This has a maximum at p(win) = 0.5, and is 0 if p(win) is 0 or 1.
>
> So, suppose your MC Go-playing program is doing its playouts, and has found
> several moves which have all won more than 90% of the time.  It can do more
> playouts with these moves, but this is a poor way of getting more
> information about which of them is best.  If instead, it pretends that it
> will have to give an extra 10 points of komi, maybe it finds that these
> moves now win, on average, only 60% of the time.  Now the enthalpy of the
> playouts is greater, so it is gathering information faster.  The information
> is not as good, it is measuring the wrong thing; but it is gathering it more
> than twice as fast, which should more than compensate.
>
> Similarly, suppose its best move has won less than 10% of the playouts. It
> could resign, but let's say it is giving a handicap to a weaker player.
>  Instead of just doing more playouts, it can pretend that it will be
> receiving extra komi.  Again, the quality of the information per playout
> then drops, but the quantity goes up, hopefully by more than enough to
> compensate.
>
> This seems like an argument for using dynamic komi, adjusted from time to
> time during each game move.
>
> Nick
> --
> Nick Wedd    [email protected]
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to