On Wed, 2008-01-23 at 18:57 -0500, Eric Boesch wrote:
> I am curious if any of those of you who have heavy-playout programs
> would find a benefit from the following modification:
> 
> >   exp_param = sqrt(0.2); // sqrt(2) times the original parameter value.
> >   uct = exp_param * sqrt( log(sum of all children playout)
> >                           * (child-win-rate-2) /
> >                         (number of child playout) );
> >   uct_value = (child winning rate) + uct;
> 
> where child-win-rate-2 is defined as
> 
> (#wins + 1) / (#wins + #losses + 2)


I'm surprised to see that this works as listed, because the math looks
all wrong to me...

I usually think of UCT as being based on the sample variance.
It looks like you're using:
  sqrt(child_win_rate_2/number_child_playouts)
Standard bernouli trials yield: 
  sqrt(child_win_rate*(1-child_win_rate)/number_child_playouts)
Beta distribution yields:
  sqrt(win_rate_2*(1-win_rate_2)/(number_child_playouts+3))

When using the full beta distribution theory, uct_value would be:
  child_win_rate_2 + uct

Is it possible you have a typo in your uct calculation?  If not, you're
really favoring high win rates over low win rates.  Or maybe with
inversions of wins and losses between ply, you're favoring exploration
of the less probable moves?  I'd really be interested in hearing more
about what you did!

PS: I'm glad to see someone else using the beta distribution theory.  I
posted it to the mailing list long ago, but didn't think anyone found it
very interesting/useful.

_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to