On Sun, Jul 3, 2011 at 10:14 PM, terry mcintyre <[email protected]> wrote: > From: Jean-loup Gailly <[email protected]> > To: [email protected] > Sent: Sun, July 3, 2011 9:12:59 AM > Subject: Re: [Computer-go] MCTS and perfect endgame > > Leon, >> One of problems (which I tested with gogui, thankyou very much) >> was losing points in endgame when program is winning. > This is by design. Pachi maximises the chance of winning, not the number > of points. But if you want Pachi to win by more points while increasing > the risk of losing, you can simply increase the parameter val_scale. See the > description in uct/uct.c: "How much of the game result value should be > influenced by win size. Zero means it isn't". The default value is 0.04, > which is the result of tuning. (If you increase val_scale above this it > starts > losing more.) > > Why should this value be static? Shouldn't the behavior change when there is > a certain win?
It should be static for a reason that is perhaps more philosophical than practical. I view MCTS as a procedure to maximize the expected value of a utility function (e.i., how happy I am with the result), which is in some important sense the only rational way to make decisions. If the utility of any win is the same, it makes sense to simply maximize the probability of winning. If we are not happy with the program wasting points in a favorable endgame, it must be the case that we are happier with a win by a large margin than with a win by a small margin, so it makes sense to build that into the reward function, which is what val_scale does. Perhaps a sigmoid of some sort would be a better shape, but it should not be something that changes dynamically. Álvaro. _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
