Re: [computer-go] low-hanging fruit - yose

Hideki Kato Thu, 13 Dec 2007 13:33:04 -0800

Hi Begué  and Don,

I did this in my earlier version of ggmc.  The real code was:


reward = 0.5 * (1 + tanhf(K * (score - komi)));
# tanhf() is a float, not double, version of hyperbolic tangent
function.
# I use tanh() as exp() may cause overflow.
# You can see the code from http://www.gggo.jp/ as it's still left.

I got best performance around 10 of K but it's so little that I'm
using simpler one now.

-Hideki

Don Dailey: <[EMAIL PROTECTED]>:
>Nice idea and worth a try.    I predict that this will weaken the
>program no matter what value you use, but that there may indeed be a
>reasonable compromise that gives you the "better" behavior with only a
>very small decline in strength.  
>
>I think this bother people so much that they would be willing to
>sacrifice a tiny bit of strength to get the greedy behavior.
>
>- Don
>
>
>Álvaro Begué wrote:
>> At the end of a playout there is probably some code that says
>> samoething like
>>   reward = (score > komi) ? 1.0 : 0.0;
>>
>> You can just replace it with
>>   reward = 1 / (1 + exp(- K * (score - komi)));
>>
>> A huge value of K will reproduce the old behaviour, a tiny value will
>> result in a program that tries to maximize expected score, and values
>> in the middle will blend both things nicely. Of course you would
>> precompute this in a table.
>>
>> This seems elegant and simple to me. Now we only need to know how it
>> affects performance. I bet there are values of K that would make
>> everyone happy (no measurable loss in strength, still play
>> good-looking moves even if the game is decided).
>>
>>
>> Álvaro.
>>
>>
>> On Dec 13, 2007 3:42 PM, Chris Fant <[EMAIL PROTECTED]
>> <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>     On Dec 13, 2007 3:33 PM, Chris Fant <[EMAIL PROTECTED]
>>     <mailto:[EMAIL PROTECTED]>> wrote:
>>     > Seems like the final solution to this would need to build out the
>>     > search tree to the end of the game, finding a winning line.  And
>>     then
>>     > search again with a different evaluation function (one based on
>>     > points).  If the second search cannot find a line that wins bigger
>>     > than the first search did, just play the move returned by the first
>>     > search.  And you could get more clever be allowing the second search
>>     > to start with some information from the first search.  Note that
>>     when
>>     > I say "winning line", I mean all the way to the end.  No MC here.
>>     >
>>
>>
>>     Actually, I suppose it need not be to the absolute end of the game.
>>     As long as all MC sims that finish out the game prior to scoring lead
>>     to a win, then you can consider the tree portion a guaranteed winning
>>     line and try the second search to maximize points.
>>     _______________________________________________
>>     computer-go mailing list
>>     [email protected] <mailto:[email protected]>
>>     http://www.computer-go.org/mailman/listinfo/computer-go/
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> computer-go mailing list
>> [email protected]
>> http://www.computer-go.org/mailman/listinfo/computer-go/
>_______________________________________________
>computer-go mailing list
>[email protected]
>http://www.computer-go.org/mailman/listinfo/computer-go/
--
[EMAIL PROTECTED] (Kato)
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Re: [computer-go] low-hanging fruit - yose

Reply via email to