Re: [Computer-go] UCT parameters and application to other games

Jason House Sat, 26 Mar 2011 09:34:29 -0700

There are some pure uct w/ light playout engines available for comparison. 
Someone is likely willing to run their bot on CGOS for comparison, but offline 
tests will get you meaningful results much more quickly.


Sent from my iPhone

On Mar 26, 2011, at 11:49 AM, Daniel Shawul <[email protected]> wrote:

> I am using 400 now and I almost never get intermediate cuts i.e plays out to 
> the end.
> But still the shallow tactical search depth (depth = 2) is not helping it.
> Is that really what I should expect to get at the start position? I mean no 
> matter what heavy
> playout scheme I use if it can't get deeper , I don't see how it is going to 
> beat it.
> Please I need an answer on this. I still have to add code to avoid filling 
> eyes and bias
> the playouts.
> 
> Here is log file for the start position. pps = the playouts per second, 
> visits = total playouts.
> nodes = total nodes in the dynamic tree. Do you see anything suspicious about 
> it ?
> 
> [st = 2813ms, mt = 59250ms , moves_left 40]
> Tree : nodes 59341 depth 2 pps 9747 visits 27419
> @@@@   0.41      58      143
> g5   0.42      76      181
> h5   0.45     146      325
> i5   0.36      26       72
> a6   0.44     122      275
> b6   0.46     164      361
> c6   0.48     471      972
> d6   0.47     283      601
> e6   0.48     430      891
> f6   0.47     292      618
> g6   0.49     584     1195
> h6   0.47     289      612
> i6   0.40      53      131
> a7   0.43      81      191
> b7   0.35      21       62
> c7   0.48     328      689
> d7   0.46     162      356
> e7   0.48     390      812
> f7   0.41      57      139
> g7   0.46     177      388
> h7   0.36      25       70
> i7   0.40      43      110
> a8   0.35      22       63
> b8   0.36      25       70
> c8   0.49     528     1085
> d8   0.00       0        8
> e8   0.47     306      646
> f8   0.39      44      112
> g8   0.44     116      263
> h8   0.44     106      242
> i8   0.42      75      178
> a9   0.36      25       70
> b9   0.38      31       84
> c9   0.14       2       14
> d9   0.39      39      102
> e9   0.40      50      124
> f9   0.38      32       85
> g9   0.42      79      186
> h9   0.39      43      109
> i9   0.21       4       21
> e5   0.48     412      856
> d5   0.46     172      378
> c5   0.49     708     1438    -> Best move played   winning rate = 0.49   
> visits = 1438 wins = 708
> b5   0.38      32       86
> a5   0.46     188      410
> i4   0.38      32       85
> h4   0.31      13       43
> g4   0.48     444      919
> f4   0.48     452      936
> e4   0.49     642     1309
> d4   0.48     396      824
> c4   0.34      19       57
> b4   0.46     163      358
> a4   0.46     188      409
> i3   0.46     207      447
> h3   0.29      11       39
> g3   0.43      86      200
> f3   0.41      56      138
> e3   0.47     280      595
> d3   0.47     283      600
> c3   0.46     188      409
> b3   0.47     231      497
> a3   0.46     167      367
> i2   0.37      28       76
> h2   0.44     104      239
> g2   0.45     155      342
> f2   0.39      39      102
> e2   0.27       9       33
> d2   0.25       6       26
> c2   0.43      89      207
> b2   0.43      94      217
> a2   0.43      81      191
> i1   0.41      54      133
> h1   0.14       2       14
> g1   0.42      63      153
> f1   0.35      21       62
> e1   0.46     187      407
> d1   0.48     314      662
> c1   0.46     174      381
> b1   0.36      25       70
> a1   0.32      15       48
> 
> 
> 
> On Sat, Mar 26, 2011 at 11:33 AM, Erik van der Werf 
> <[email protected]> wrote:
> I agree, if you use a hard limit it should be much higher (probably
> something like twice the board surface is ok).
> 
> 110 moves is just an observation of the average playout length for the
> empty 9x9 board. With smarter playouts that average tends to become
> lower, but the distribution may still have a long tail.
> 
> Erik
> 
> 
> On Sat, Mar 26, 2011 at 3:36 PM, Rémi Coulom <[email protected]> wrote:
> > I'd recommend more than 110. Maybe 200 is better. In Crazy Stone, I use no 
> > limit, and test for superko.
> >
> > Rémi
> >
> > On 26 mars 2011, at 15:32, Daniel Shawul wrote:
> >
> >> Sorry 81 moves was a bad estimate by me. I am actually using 96 moves. I 
> >> will change that to 110 or above
> >> moves and see what effect it has. Also I would take Remi's suggestion i.e 
> >> to bias the move selection process.
> >> For the alpha-beta program , I have a decent move ordering algorithm and 
> >> qsearch. I guess can borrow some from that.
> >>
> >> In the meantime, I found a paper using UCT for chinese checkers and other 
> >> games 
> >> http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F~sturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja,
> >>  and also
> >> some fun java programs using UCT for checkers. It seems UCT is indeed 
> >> competitive in checkers.
> >> I must say I didn't expect that at all. I think the forced nature of 
> >> captures helps to improve tactical awareness of the MC simulations.
> >> Is that so ?
> >>
> >>
> >> On Sat, Mar 26, 2011 at 8:52 AM, Erik van der Werf 
> >> <[email protected]> wrote:
> >> Ah ok, I misunderstood.
> >>
> >> Still something seems to be wrong. On the empty 9x9 board I think most
> >> programs with random/light playouts play in the order of 110 moves.
> >> ~81 moves seems quite low; in my experience you can only get such low
> >> numbers to work well if you have a lot of knowledge in your playouts.
> >> Did you check the quality of the evaluations/playouts?
> >>
> >> If you want UCT to search deeper you need good priors and perhaps
> >> something like rave/amaf.
> >>
> >> Best,
> >> Erik
> >>
> >>
> >> On Sat, Mar 26, 2011 at 1:13 PM, Daniel Shawul <[email protected]> wrote:
> >> > Hello,
> >> > I am using monte carlo playouts for the UCT method. It can do about 
> >> > 10k/sec.
> >> > The UCT tree is expanded to a depth of  d = 3 in a 5 sec search, from 
> >> > then
> >> > onwards a random playout (with no bias)
> >> > is carried out.  Actually it is a 'patial playout' which doesn't go to 
> >> > the
> >> > end of the game, rather upto a depth of MAX_PLY=96.
> >> >  If the game has ended earlier, then a win/draw/loss is returned. 
> >> > Otherwise
> >> > I  forcefully end the game by using a determinstic eval
> >> > and assign a WDL. For 9x9 go actually most of random playouts end before
> >> > move 81.
> >> > For the alpha-beta searcher , I do classical evaluation. With heavy use 
> >> > of
> >> > reductions
> >> > I can get a depth of 14 half plies , which seems to give it quite an edge
> >> > against the UCT version.
> >> > Is the depth of expansion for the UCT tree too low ? (d = 3 in a 5 sec
> >> > search). Should I lower the UCTK parameter
> >> > to 0.1 or so which seems to give me a depth = 7 at the start positon of a
> >> > 9x9 go. I am confident my implementation is
> >> > correct because it is working quite well in my checkers program despite 
> >> > my
> >> > expectation.
> >> > thanks
> >> > Daniel
> >> >
> >> > On Sat, Mar 26, 2011 at 7:54 AM, Erik van der Werf
> >> > <[email protected]> wrote:
> >> >>
> >> >> It sounds like you're using a classical (deterministic) evaluation
> >> >> function.
> >> >> Try combining UCT with Monte Carlo evaluation.
> >> >>
> >> >> Erik
> >> >>
> >> >>
> >> >> On Sat, Mar 26, 2011 at 12:43 PM, Daniel Shawul <[email protected]> 
> >> >> wrote:
> >> >> > Hello,
> >> >> > I am very new to UCT,  just implemented basic UCT for go yesterday.
> >> >> > But with no success so far for GO,I think  mostly because it searches
> >> >> > not
> >> >> > very deep (depth = 3 on a 5 sec search with those values).
> >> >> > I am using the following values as UCT parameters
> >> >> > UCTK = sqrt(1/5) = 0.44     UCTN = 10 (visits afte which best move is
> >> >> > expanded)
> >> >> > Even if I lower UCTK down to 7 I get a maximum depth of d=7 at the 
> >> >> > start
> >> >> > position for a 5 sec search.
> >> >> > For how deep a search should I tune these parameter for ?
> >> >> > Before UCT,  I have an alpha-beta searcher which sometimes plays on
> >> >> > CGOS.
> >> >> > It reached a level of ~1500, and this engine seems to be too strong 
> >> >> > for
> >> >> > the
> >> >> > UCT version.
> >> >> >  It just gets outsearched in some tactical positions and also in
> >> >> > evaluation
> >> >> > I think.
> >> >> > For example, I have an evaluation term which gives big bonuses for
> >> >> > connected
> >> >> > strings which seems
> >> >> > to give an edge in a lot of games.. How do you introduce such eval 
> >> >> > terms
> >> >> > in
> >> >> > UCT ?
> >> >> > But for my checkers program , to my big surprise , UCT made a
> >> >> > significant
> >> >> > impact. The regular
> >> >> > alpha-beta searcher averages a depth=25 but the UCT version I think is
> >> >> > equally strong from the games
> >> >> > I saw. That was a kind of surprise for me because I thought UCT would
> >> >> > work
> >> >> > better for bushy trees and
> >> >> > when the eval has a lot of strategy. It also reached good depths
> >> >> > averaging
> >> >> > 16 plies .
> >> >> > My checkers eval had only material in it, so I don't know if UCT
> >> >> > is bringing
> >> >> > strategy (distant information) to the game
> >> >> > which the other one don't have.The games are not really played out to
> >> >> > the
> >> >> > end rather to a MAX_PLY = 96
> >> >> > afte which the material is counted and a WDL score is assigned (I call
> >> >> > it
> >> >> > partial playout).
> >> >> > Also the fact that captures are forced seem to help a lot because it
> >> >> > doesn't
> >> >> > make too many mistakes.
> >> >> > I also found out some positions where it encounters similar problems 
> >> >> > as
> >> >> > ladders in go. But in the checkers case,
> >> >> > this problems are still solved correctly. Only problem is that it
> >> >> > doesn't
> >> >> > report correct looking winning rates.
> >> >> > For example, in a position with two kings where one of the kings is
> >> >> > chasing
> >> >> > the other to the sides to mate it, but
> >> >> > the loosing king can draw by making a serious of correct moves to get
> >> >> > itself
> >> >> > to one of the safe corners; The program
> >> >> > displays winning rates of 0.01 (when it should have been more like 
> >> >> > 0.5)
> >> >> > but
> >> >> > it still manages the draw !
> >> >> > thanks and apologies for the verbose email
> >> >> > Daniel
> >> >> > _______________________________________________
> >> >> > Computer-go mailing list
> >> >> > [email protected]
> >> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >> >
> >> >> _______________________________________________
> >> >> Computer-go mailing list
> >> >> [email protected]
> >> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >
> >> >
> >> > _______________________________________________
> >> > Computer-go mailing list
> >> > [email protected]
> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >>
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> 
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] UCT parameters and application to other games

Reply via email to