Re: [Computer-go] MCTS and perfect endgame

Álvaro Begué Sun, 03 Jul 2011 19:50:58 -0700

On Sun, Jul 3, 2011 at 10:14 PM, terry mcintyre <[email protected]> wrote:
> From: Jean-loup Gailly <[email protected]>
> To: [email protected]
> Sent: Sun, July 3, 2011 9:12:59 AM
> Subject: Re: [Computer-go] MCTS and perfect endgame
>
> Leon,
>> One of problems (which I tested with gogui, thankyou very much)
>> was losing points in endgame when program is winning.
> This is by design. Pachi maximises the chance of winning, not the number
> of points. But if you want Pachi to win by more points while increasing
> the risk of losing, you can simply increase the parameter val_scale. See the
> description in uct/uct.c: "How much of the game result value should be
> influenced by win size. Zero means it isn't". The default value is 0.04,
> which is the result of tuning. (If you increase val_scale above this it
> starts
> losing more.)
>
> Why should this value be static? Shouldn't the behavior change when there is
> a certain win?


It should be static for a reason that is perhaps more philosophical
than practical. I view MCTS as a procedure to maximize the expected
value of a utility function (e.i., how happy I am with the result),
which is in some important sense the only rational way to make
decisions. If the utility of any win is the same, it makes sense to
simply maximize the probability of winning. If we are not happy with
the program wasting points in a favorable endgame, it must be the case
that we are happier with a win by a large margin than with a win by a
small margin, so it makes sense to build that into the reward
function, which is what val_scale does. Perhaps a sigmoid of some sort
would be a better shape, but it should not be something that changes
dynamically.

Álvaro.
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] MCTS and perfect endgame

Reply via email to