Hi Begué and Don, I did this in my earlier version of ggmc. The real code was:
reward = 0.5 * (1 + tanhf(K * (score - komi))); # tanhf() is a float, not double, version of hyperbolic tangent function. # I use tanh() as exp() may cause overflow. # You can see the code from http://www.gggo.jp/ as it's still left. I got best performance around 10 of K but it's so little that I'm using simpler one now. -Hideki Don Dailey: <[EMAIL PROTECTED]>: >Nice idea and worth a try. I predict that this will weaken the >program no matter what value you use, but that there may indeed be a >reasonable compromise that gives you the "better" behavior with only a >very small decline in strength. > >I think this bother people so much that they would be willing to >sacrifice a tiny bit of strength to get the greedy behavior. > >- Don > > >Álvaro Begué wrote: >> At the end of a playout there is probably some code that says >> samoething like >> reward = (score > komi) ? 1.0 : 0.0; >> >> You can just replace it with >> reward = 1 / (1 + exp(- K * (score - komi))); >> >> A huge value of K will reproduce the old behaviour, a tiny value will >> result in a program that tries to maximize expected score, and values >> in the middle will blend both things nicely. Of course you would >> precompute this in a table. >> >> This seems elegant and simple to me. Now we only need to know how it >> affects performance. I bet there are values of K that would make >> everyone happy (no measurable loss in strength, still play >> good-looking moves even if the game is decided). >> >> >> Álvaro. >> >> >> On Dec 13, 2007 3:42 PM, Chris Fant <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> >> On Dec 13, 2007 3:33 PM, Chris Fant <[EMAIL PROTECTED] >> <mailto:[EMAIL PROTECTED]>> wrote: >> > Seems like the final solution to this would need to build out the >> > search tree to the end of the game, finding a winning line. And >> then >> > search again with a different evaluation function (one based on >> > points). If the second search cannot find a line that wins bigger >> > than the first search did, just play the move returned by the first >> > search. And you could get more clever be allowing the second search >> > to start with some information from the first search. Note that >> when >> > I say "winning line", I mean all the way to the end. No MC here. >> > >> >> >> Actually, I suppose it need not be to the absolute end of the game. >> As long as all MC sims that finish out the game prior to scoring lead >> to a win, then you can consider the tree portion a guaranteed winning >> line and try the second search to maximize points. >> _______________________________________________ >> computer-go mailing list >> [email protected] <mailto:[email protected]> >> http://www.computer-go.org/mailman/listinfo/computer-go/ >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> computer-go mailing list >> [email protected] >> http://www.computer-go.org/mailman/listinfo/computer-go/ >_______________________________________________ >computer-go mailing list >[email protected] >http://www.computer-go.org/mailman/listinfo/computer-go/ -- [EMAIL PROTECTED] (Kato) _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
