Re: [computer-go] 19x19 MC improvement

2008-01-27 Thread Rémi Coulom

Eric Boesch wrote:


By the way, does anybody know of any nifty tools or heuristics for
efficient probabilistic multi-parameter optimization? In other words,
like multi-dimensional optimization, except instead of your function
returning a deterministic value, it returns the result of a Bernoulli
trial, and the heuristic uses those trial results to converge as
rapidly as possible to parameter values that roughly maximize the
success probability. The obvious approach is to cycle through all
dimensions in sequence, treating it as a one-dimensional optimization
problem -- though the best way to optimize in one dimension isn't
obvious to me either -- but just as with the deterministic version of
optimization, I assume it's possible to do better than that. It might
be fun problem to play with, but if good tools already exist then it
wouldn't be very productive for me to waste time reinventing the
broken wheel.
  

RSPSA:
http://www.sztaki.hu/~szcsaba/papers/rspsa_acg.pdf

Levente Kocsis, Csaba Szepesvari, Mark H.M. Winands

Abstract. Most game programs have a large number of parameters that
are crucial for their performance. While tuning these parameters by hand
is rather di±cult, successful applications of automatic optimisation al-
gorithms in game programs are known only for parameters that belong
to certain components (e.g. evaluation-function parameters). The SPSA
(Simultaneous Perturbation Stochastic Approximation) algorithm is an
attractive choice for optimising any kind of parameters of a game pro-
gram, both for its generality and its simplicity. It's disadvantage is that
it can be very slow. In this article we propose several methods to speed
up SPSA, in particular, the combination with RPROP, using common
random numbers, antithetic variables and averaging. We test the result-
ing algorithm for tuning various types of parameters in two domains,
poker and LOA. From the experimental study, we conclude that using
SPSA is a viable approach for optimisation in game programs, especially
if no good alternative exists for the types of parameters considered.

Rémi
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] 19x19 MC improvement

2008-01-26 Thread Eric Boesch
On Jan 23, 2008 7:39 PM, Jason House [EMAIL PROTECTED] wrote:
 On Wed, 2008-01-23 at 18:57 -0500, Eric Boesch wrote:
  I am curious if any of those of you who have heavy-playout programs
  would find a benefit from the following modification:
 
 exp_param = sqrt(0.2); // sqrt(2) times the original parameter value.
 uct = exp_param * sqrt( log(sum of all children playout)
 * (child-win-rate-2) /
   (number of child playout) );
 uct_value = (child winning rate) + uct;
 
  where child-win-rate-2 is defined as
 
  (#wins + 1) / (#wins + #losses + 2)

 I'm surprised to see that this works as listed, because the math looks
 all wrong to me...

Argh. I have to retract the claim that this helps. I didn't optimize
the libego parameters correctly before I tested it. Sorry about that
-- I thought I did. There's a lot more I could add, but I thought I'd
get that out there before anyone wasted (probably) any more time on my
error.

By the way, does anybody know of any nifty tools or heuristics for
efficient probabilistic multi-parameter optimization? In other words,
like multi-dimensional optimization, except instead of your function
returning a deterministic value, it returns the result of a Bernoulli
trial, and the heuristic uses those trial results to converge as
rapidly as possible to parameter values that roughly maximize the
success probability. The obvious approach is to cycle through all
dimensions in sequence, treating it as a one-dimensional optimization
problem -- though the best way to optimize in one dimension isn't
obvious to me either -- but just as with the deterministic version of
optimization, I assume it's possible to do better than that. It might
be fun problem to play with, but if good tools already exist then it
wouldn't be very productive for me to waste time reinventing the
broken wheel.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] 19x19 MC improvement

2008-01-26 Thread wing
Eric Boesch

This is probably massive overkill, but one of the most
successful techniques for multi-parameter optimization
is Taguchi methods.

   http://en.wikipedia.org/wiki/Taguchi_methods

However, in my experience, starting with the decisions
that make the biggest difference, and then adding in
the next-biggest decision, and so on, usually gets
close enough to optimal.

In my cgbg performance tests, most reasonable decisions
affect total program performance by a factor of 2 at most,
which translates to 1 rank at most. Likewise, I doubt that
tweaking of existing MC parameters will bring more than
1 rank. Further improvement will require new ideas.

Michael Wing

 By the way, does anybody know of any nifty tools or heuristics for
 efficient probabilistic multi-parameter optimization? In other words,
 like multi-dimensional optimization, except instead of your function
 returning a deterministic value, it returns the result of a Bernoulli
 trial, and the heuristic uses those trial results to converge as
 rapidly as possible to parameter values that roughly maximize the
 success probability. The obvious approach is to cycle through all
 dimensions in sequence, treating it as a one-dimensional optimization
 problem -- though the best way to optimize in one dimension isn't
 obvious to me either -- but just as with the deterministic version of
 optimization, I assume it's possible to do better than that. It might
 be fun problem to play with, but if good tools already exist then it
 wouldn't be very productive for me to waste time reinventing the
 broken wheel.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] 19x19 MC improvement

2008-01-26 Thread steve uurtamo
i recommend:

http://www.research.att.com/~njas/gosset/index.html

s.


- Original Message 
From: [EMAIL PROTECTED] [EMAIL PROTECTED]
To: computer-go computer-go@computer-go.org
Sent: Saturday, January 26, 2008 2:35:01 PM
Subject: Re: [computer-go] 19x19 MC improvement


Eric Boesch

This is probably massive overkill, but one of the most
successful techniques for multi-parameter optimization
is Taguchi methods.

   http://en.wikipedia.org/wiki/Taguchi_methods

However, in my experience, starting with the decisions
that make the biggest difference, and then adding in
the next-biggest decision, and so on, usually gets
close enough to optimal.

In my cgbg performance tests, most reasonable decisions
affect total program performance by a factor of 2 at most,
which translates to 1 rank at most. Likewise, I doubt that
tweaking of existing MC parameters will bring more than
1 rank. Further improvement will require new ideas.

Michael Wing

 By the way, does anybody know of any nifty tools or heuristics for
 efficient probabilistic multi-parameter optimization? In other words,
 like multi-dimensional optimization, except instead of your function
 returning a deterministic value, it returns the result of a Bernoulli
 trial, and the heuristic uses those trial results to converge as
 rapidly as possible to parameter values that roughly maximize the
 success probability. The obvious approach is to cycle through all
 dimensions in sequence, treating it as a one-dimensional optimization
 problem -- though the best way to optimize in one dimension isn't
 obvious to me either -- but just as with the deterministic version of
 optimization, I assume it's possible to do better than that. It might
 be fun problem to play with, but if good tools already exist then it
 wouldn't be very productive for me to waste time reinventing the
 broken wheel.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/





  

Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] 19x19 MC improvement

2008-01-26 Thread Adrian Grajdeanu

By the way, does anybody know of any nifty tools or heuristics for
efficient probabilistic multi-parameter optimization? In other words,
like multi-dimensional optimization, except instead of your function
returning a deterministic value, it returns the result of a Bernoulli
trial, and the heuristic uses those trial results to converge as
rapidly as possible to parameter values that roughly maximize the
success probability.


I recommend evolutionary algorithms because they are robust on noise and 
don't require a quadratic or linear model for the function they 
optimize. I would go as simple as a ES(1+1) algorithm (a glorified name 
for a simple hill climber that probes randomly for its next step). I 
would also use restarts: run it once until no more improvement is 
apparent, then run it again and again (restarts) a few times (5-10) and 
take the overall optimum found. You'd be surprised how far you can get 
with this method!


Adrian

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] 19x19 MC improvement

2008-01-23 Thread David Fotland
Congratulations!  I'm really impressed with how fast the AyaMC improved.

Regards,

David

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:computer-go-
 [EMAIL PROTECTED] On Behalf Of Hiroshi Yamashita
 Sent: Wednesday, January 23, 2008 12:12 AM
 To: computer-go
 Subject: [computer-go] 19x19 MC improvement
 
 My program Aya637_4CPU is top on 19x19 CGOS now.
 It is temporary, but I'm so happy!
 Please don't run full power MoGo, Crazy Stone and Leela :-)
 
 My main improvement was
 
 1. Do not move dead stones.
  Before doing playout, Aya does string capture search up to 7
 plies(ladders
  are extended). And set the probability of dead stones's liberty to
 1/1.
  This changed winning rate 0.17 to 0.33 (both are 200 games, 4po
  against GnuGo 3.7.10) But in 9x9, this method changed nothing.
  There is no ladder search in playout.
 
 2. Change UCT exploration parameter
 
   exp_param = sqrt(2.0);
   uct = exp_param * sqrt( log(sum of all children playout) /
 (number of child playout) );
   uct_value = (child winning rate) + uct;
 
  I changed exp_param sqrt(2.0) to sqrt(0.1).
  This changed winning rate 0.10(660 games) to 0.22(140 games).
  (1po, against GnuGo 3.7.10)
  I was surprised such a small parameter change made big difference.
 
 AyaMC's sytle is similar to Crazy Stone.
  - progressive windening.
  - pattern is only 3x3 from 1 professional player's game.
 Uct code is like this, http://www.yss-aya.com/uct_aya_e.cpp
 In 1po of the root initial position, 11 nodes are created,
  and 18 candidates are searched
 
 Regards,
 Hiroshi Yamashita
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] 19x19 MC improvement

2008-01-23 Thread Eric Boesch
On Jan 23, 2008 3:11 AM, Hiroshi Yamashita [EMAIL PROTECTED] wrote:
 2. Change UCT exploration parameter
   exp_param = sqrt(2.0);
   uct = exp_param * sqrt( log(sum of all children playout) /
 (number of child playout) );
   uct_value = (child winning rate) + uct;

  I changed exp_param sqrt(2.0) to sqrt(0.1).

2.0 to 0.1? That's a pretty big step. I did notice that libego worked
better with lower constants than recommended by the formula -- in
libego, the original UCB1 formula corresponds to an explore_rate
coefficient of 8, but the default explore_rate coefficient of 1 seemed
to be about the best.

I am curious if any of those of you who have heavy-playout programs
would find a benefit from the following modification:

   exp_param = sqrt(0.2); // sqrt(2) times the original parameter value.
   uct = exp_param * sqrt( log(sum of all children playout)
   * (child-win-rate-2) /
 (number of child playout) );
   uct_value = (child winning rate) + uct;

where child-win-rate-2 is defined as

(#wins + 1) / (#wins + #losses + 2)

If it happens that you do the equivalent of initializing #wins to 1
and #losses to 1 (in libego, setting the initial bias to 2), then you
can just use your original (child winning rate) value. (W+1)/(W+L+2)
is the mean of a beta(W+1,L+1) random variable, which is what you get
when you start with a uniform(0,1) distribution and condition it by
the observation of a W-L record. The explore parameter inside the
square root is doubled so that when you have an average
child-win-rate-2 value of 0.5, the formula returns the same value as
before. (Using an initial bias of 2.0 seems like a good thing anyhow.)

Adding this extra term seemed to help a bit (57% +/- 4.5% win rate over
unmodified program) when the basic playouts were uniform.

This change tends to make the formula stick with proven winners more
than before, so you might need to increase the explore rate by a
little more than sqrt(2).

Your mileage may vary -- I'd also like to know if you try this change
and find it unhelpful.

The justification is that the standard deviation of a beta(1,21) is
lower than the standard deviation of a beta(11,11) variable. The error
term of a beta(21,1) is lower too, but applying the (L+1)/(W+L+2) term
to the UCB1 formula can yield nonsensical results where when two moves
have the same bias, the one with the worse win-loss record is assigned
the higher UCB1 value.
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] 19x19 MC improvement

2008-01-23 Thread Jason House

On Wed, 2008-01-23 at 18:57 -0500, Eric Boesch wrote:
 I am curious if any of those of you who have heavy-playout programs
 would find a benefit from the following modification:
 
exp_param = sqrt(0.2); // sqrt(2) times the original parameter value.
uct = exp_param * sqrt( log(sum of all children playout)
* (child-win-rate-2) /
  (number of child playout) );
uct_value = (child winning rate) + uct;
 
 where child-win-rate-2 is defined as
 
 (#wins + 1) / (#wins + #losses + 2)


I'm surprised to see that this works as listed, because the math looks
all wrong to me...

I usually think of UCT as being based on the sample variance.
It looks like you're using:
  sqrt(child_win_rate_2/number_child_playouts)
Standard bernouli trials yield: 
  sqrt(child_win_rate*(1-child_win_rate)/number_child_playouts)
Beta distribution yields:
  sqrt(win_rate_2*(1-win_rate_2)/(number_child_playouts+3))

When using the full beta distribution theory, uct_value would be:
  child_win_rate_2 + uct

Is it possible you have a typo in your uct calculation?  If not, you're
really favoring high win rates over low win rates.  Or maybe with
inversions of wins and losses between ply, you're favoring exploration
of the less probable moves?  I'd really be interested in hearing more
about what you did!

PS: I'm glad to see someone else using the beta distribution theory.  I
posted it to the mailing list long ago, but didn't think anyone found it
very interesting/useful.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/