Re: [Computer-go] UCT parameters and application to other games

Daniel Shawul Sat, 26 Mar 2011 09:37:26 -0700

It was a just a first implementation since I don't really know how to detect
double eyes
and prevent the program from filling it.  However, I do prevent sucide moves
and KO repetitions.
First win for the UCT version after full playouts (MAX_PLY = 400) !!
This game is played on Winboard so I don't know if you can replay it. Anyway
here it is
I think it outplayed the alpha-beta searcher in the endgame. Before that the
alpha-beta version
was leading according to its eval.


[Event "Computer Chess Game"]
[Site "CEE-3624-AB52"]
[Date "2011.03.26"]
[Round "-"]
[White "NebiyuGo_1.2"]
[Black "NebiyuGo_1.2"] -----> This one was the UCT version
[Result "0-1"]
[TimeControl "40/120"]
[Variant "go"]
[FEN "9/9/9/9/9/9/9/9/9 w - - 0 1"]
[SetUp "1"]

{--------------
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
white to play
--------------}
1. P@f5 {+1.96/13} P@d6 2. P@d5 {+2.38/16 2.6} P@c5 3. P@e5 {+1.99/16 2.9}
P@c4 4. P@e6 {+2.10/16 2.7} P@d4 5. P@e7 {+2.16/15 2.2} P@f3 6. P@c6
{+3.05/17 3} P@g4 7. P@g5 {+2.90/16 2.6} P@e4 8. P@h4 {+3.08/15 2.4} P@b6
9. P@d7 {+3.40/16 2.6} P@h5 10. P@f4 {+3.00/15 2.2} P@g3 11. P@b7
{+2.49/16 2.9} P@e3 12. P@h6 {+3.70/17 3} P@h3 13. P@i5 {+4.58/18 2.8} P@b5
14. P@c7 {+4.55/13 2.5} P@a7 15. P@a8 {+4.81/18 3} P@a6 16. P@b8
{+5.07/18 2.6} P@i3 17. P@h7 {+5.86/16 2.9} P@i4 18. P@h5 {+5.42/15 0.8}
P@f8 19. P@e8 {+7.07/16 2.9} P@d1 20. P@b3 {+7.58/16 3} P@g8 21. P@h8
{+7.91/18 2.9} P@d2 22. P@c2 {+7.91/18 3} P@c1 23. P@b1 {+8.30/17 3} P@c3
24. P@b2 {+8.01/14 0.8} P@b4 25. P@a4 {+6.33/15 3} P@a2 26. P@a3
{+2.88/16 2.6} P@a5 27. P@h9 {+2.53/16 3} P@a1 28. P@e9 {+2.61/16 2.7} P@h2
29. P@g7 {+2.29/16 3} P@i7 30. P@i8 {+5.06/13 2.8} P@f2 31. P@f7
{+4.66/17 2.8} P@g1 32. P@c2 {+5.00/16 4} P@b2 33. P@b9 {+4.62/19 4} P@a3
34. P@d6 {+4.58/20 3} P@c9 35. P@c8 {+4.84/19 4} P@e1 36. P@g9 {+4.97/21 4}
P@i1 37. P@d8 {+4.15/20 4} P@g6 38. P@f9 {+3.86/20 3} P@g8 39. P@a9
{+2.25/19 3} P@b1 40. P@f6 {-46.58/19 6} pass 41. P@d9 {-49.26/20 5} pass
{White resigns} 0-1



On Sat, Mar 26, 2011 at 12:23 PM, Michael Williams <
[email protected]> wrote:

> You allow eye-filling in the playouts?  Or anywhere?  It should never
> be allowed.  Top priority.
>
>
> On Sat, Mar 26, 2011 at 11:49 AM, Daniel Shawul <[email protected]> wrote:
> > I am using 400 now and I almost never get intermediate cuts i.e plays out
> to
> > the end.
> > But still the shallow tactical search depth (depth = 2) is not helping
> it.
> > Is that really what I should expect to get at the start position? I mean
> no
> > matter what heavy
> > playout scheme I use if it can't get deeper , I don't see how it is going
> to
> > beat it.
> > Please I need an answer on this. I still have to add code to avoid
> filling
> > eyes and bias
> > the playouts.
> >
> > Here is log file for the start position. pps = the playouts per second,
> > visits = total playouts.
> > nodes = total nodes in the dynamic tree. Do you see anything suspicious
> > about it ?
> >
> > [st = 2813ms, mt = 59250ms , moves_left 40]
> > Tree : nodes 59341 depth 2 pps 9747 visits 27419
> > @@@@   0.41      58      143
> > g5   0.42      76      181
> > h5   0.45     146      325
> > i5   0.36      26       72
> > a6   0.44     122      275
> > b6   0.46     164      361
> > c6   0.48     471      972
> > d6   0.47     283      601
> > e6   0.48     430      891
> > f6   0.47     292      618
> > g6   0.49     584     1195
> > h6   0.47     289      612
> > i6   0.40      53      131
> > a7   0.43      81      191
> > b7   0.35      21       62
> > c7   0.48     328      689
> > d7   0.46     162      356
> > e7   0.48     390      812
> > f7   0.41      57      139
> > g7   0.46     177      388
> > h7   0.36      25       70
> > i7   0.40      43      110
> > a8   0.35      22       63
> > b8   0.36      25       70
> > c8   0.49     528     1085
> > d8   0.00       0        8
> > e8   0.47     306      646
> > f8   0.39      44      112
> > g8   0.44     116      263
> > h8   0.44     106      242
> > i8   0.42      75      178
> > a9   0.36      25       70
> > b9   0.38      31       84
> > c9   0.14       2       14
> > d9   0.39      39      102
> > e9   0.40      50      124
> > f9   0.38      32       85
> > g9   0.42      79      186
> > h9   0.39      43      109
> > i9   0.21       4       21
> > e5   0.48     412      856
> > d5   0.46     172      378
> > c5   0.49     708     1438    -> Best move played   winning rate = 0.49
> > visits = 1438 wins = 708
> > b5   0.38      32       86
> > a5   0.46     188      410
> > i4   0.38      32       85
> > h4   0.31      13       43
> > g4   0.48     444      919
> > f4   0.48     452      936
> > e4   0.49     642     1309
> > d4   0.48     396      824
> > c4   0.34      19       57
> > b4   0.46     163      358
> > a4   0.46     188      409
> > i3   0.46     207      447
> > h3   0.29      11       39
> > g3   0.43      86      200
> > f3   0.41      56      138
> > e3   0.47     280      595
> > d3   0.47     283      600
> > c3   0.46     188      409
> > b3   0.47     231      497
> > a3   0.46     167      367
> > i2   0.37      28       76
> > h2   0.44     104      239
> > g2   0.45     155      342
> > f2   0.39      39      102
> > e2   0.27       9       33
> > d2   0.25       6       26
> > c2   0.43      89      207
> > b2   0.43      94      217
> > a2   0.43      81      191
> > i1   0.41      54      133
> > h1   0.14       2       14
> > g1   0.42      63      153
> > f1   0.35      21       62
> > e1   0.46     187      407
> > d1   0.48     314      662
> > c1   0.46     174      381
> > b1   0.36      25       70
> > a1   0.32      15       48
> >
> >
> >
> > On Sat, Mar 26, 2011 at 11:33 AM, Erik van der Werf
> > <[email protected]> wrote:
> >>
> >> I agree, if you use a hard limit it should be much higher (probably
> >> something like twice the board surface is ok).
> >>
> >> 110 moves is just an observation of the average playout length for the
> >> empty 9x9 board. With smarter playouts that average tends to become
> >> lower, but the distribution may still have a long tail.
> >>
> >> Erik
> >>
> >>
> >> On Sat, Mar 26, 2011 at 3:36 PM, Rémi Coulom <[email protected]>
> wrote:
> >> > I'd recommend more than 110. Maybe 200 is better. In Crazy Stone, I
> use
> >> > no limit, and test for superko.
> >> >
> >> > Rémi
> >> >
> >> > On 26 mars 2011, at 15:32, Daniel Shawul wrote:
> >> >
> >> >> Sorry 81 moves was a bad estimate by me. I am actually using 96
> moves.
> >> >> I will change that to 110 or above
> >> >> moves and see what effect it has. Also I would take Remi's suggestion
> >> >> i.e to bias the move selection process.
> >> >> For the alpha-beta program , I have a decent move ordering algorithm
> >> >> and qsearch. I guess can borrow some from that.
> >> >>
> >> >> In the meantime, I found a paper using UCT for chinese checkers and
> >> >> other games
> >> >>
> http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F~sturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja<http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F%7Esturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja>
> ,
> >> >> and also
> >> >> some fun java programs using UCT for checkers. It seems UCT is indeed
> >> >> competitive in checkers.
> >> >> I must say I didn't expect that at all. I think the forced nature of
> >> >> captures helps to improve tactical awareness of the MC simulations.
> >> >> Is that so ?
> >> >>
> >> >>
> >> >> On Sat, Mar 26, 2011 at 8:52 AM, Erik van der Werf
> >> >> <[email protected]> wrote:
> >> >> Ah ok, I misunderstood.
> >> >>
> >> >> Still something seems to be wrong. On the empty 9x9 board I think
> most
> >> >> programs with random/light playouts play in the order of 110 moves.
> >> >> ~81 moves seems quite low; in my experience you can only get such low
> >> >> numbers to work well if you have a lot of knowledge in your playouts.
> >> >> Did you check the quality of the evaluations/playouts?
> >> >>
> >> >> If you want UCT to search deeper you need good priors and perhaps
> >> >> something like rave/amaf.
> >> >>
> >> >> Best,
> >> >> Erik
> >> >>
> >> >>
> >> >> On Sat, Mar 26, 2011 at 1:13 PM, Daniel Shawul <[email protected]>
> >> >> wrote:
> >> >> > Hello,
> >> >> > I am using monte carlo playouts for the UCT method. It can do about
> >> >> > 10k/sec.
> >> >> > The UCT tree is expanded to a depth of  d = 3 in a 5 sec search,
> from
> >> >> > then
> >> >> > onwards a random playout (with no bias)
> >> >> > is carried out.  Actually it is a 'patial playout' which doesn't go
> >> >> > to the
> >> >> > end of the game, rather upto a depth of MAX_PLY=96.
> >> >> >  If the game has ended earlier, then a win/draw/loss is returned.
> >> >> > Otherwise
> >> >> > I  forcefully end the game by using a determinstic eval
> >> >> > and assign a WDL. For 9x9 go actually most of random playouts end
> >> >> > before
> >> >> > move 81.
> >> >> > For the alpha-beta searcher , I do classical evaluation. With heavy
> >> >> > use of
> >> >> > reductions
> >> >> > I can get a depth of 14 half plies , which seems to give it quite
> an
> >> >> > edge
> >> >> > against the UCT version.
> >> >> > Is the depth of expansion for the UCT tree too low ? (d = 3 in a 5
> >> >> > sec
> >> >> > search). Should I lower the UCTK parameter
> >> >> > to 0.1 or so which seems to give me a depth = 7 at the start
> positon
> >> >> > of a
> >> >> > 9x9 go. I am confident my implementation is
> >> >> > correct because it is working quite well in my checkers program
> >> >> > despite my
> >> >> > expectation.
> >> >> > thanks
> >> >> > Daniel
> >> >> >
> >> >> > On Sat, Mar 26, 2011 at 7:54 AM, Erik van der Werf
> >> >> > <[email protected]> wrote:
> >> >> >>
> >> >> >> It sounds like you're using a classical (deterministic) evaluation
> >> >> >> function.
> >> >> >> Try combining UCT with Monte Carlo evaluation.
> >> >> >>
> >> >> >> Erik
> >> >> >>
> >> >> >>
> >> >> >> On Sat, Mar 26, 2011 at 12:43 PM, Daniel Shawul <
> [email protected]>
> >> >> >> wrote:
> >> >> >> > Hello,
> >> >> >> > I am very new to UCT,  just implemented basic UCT for go
> >> >> >> > yesterday.
> >> >> >> > But with no success so far for GO,I think  mostly because it
> >> >> >> > searches
> >> >> >> > not
> >> >> >> > very deep (depth = 3 on a 5 sec search with those values).
> >> >> >> > I am using the following values as UCT parameters
> >> >> >> > UCTK = sqrt(1/5) = 0.44     UCTN = 10 (visits afte which best
> move
> >> >> >> > is
> >> >> >> > expanded)
> >> >> >> > Even if I lower UCTK down to 7 I get a maximum depth of d=7 at
> the
> >> >> >> > start
> >> >> >> > position for a 5 sec search.
> >> >> >> > For how deep a search should I tune these parameter for ?
> >> >> >> > Before UCT,  I have an alpha-beta searcher which sometimes plays
> >> >> >> > on
> >> >> >> > CGOS.
> >> >> >> > It reached a level of ~1500, and this engine seems to be too
> >> >> >> > strong for
> >> >> >> > the
> >> >> >> > UCT version.
> >> >> >> >  It just gets outsearched in some tactical positions and also in
> >> >> >> > evaluation
> >> >> >> > I think.
> >> >> >> > For example, I have an evaluation term which gives big bonuses
> for
> >> >> >> > connected
> >> >> >> > strings which seems
> >> >> >> > to give an edge in a lot of games.. How do you introduce such
> eval
> >> >> >> > terms
> >> >> >> > in
> >> >> >> > UCT ?
> >> >> >> > But for my checkers program , to my big surprise , UCT made a
> >> >> >> > significant
> >> >> >> > impact. The regular
> >> >> >> > alpha-beta searcher averages a depth=25 but the UCT version I
> >> >> >> > think is
> >> >> >> > equally strong from the games
> >> >> >> > I saw. That was a kind of surprise for me because I thought UCT
> >> >> >> > would
> >> >> >> > work
> >> >> >> > better for bushy trees and
> >> >> >> > when the eval has a lot of strategy. It also reached good depths
> >> >> >> > averaging
> >> >> >> > 16 plies .
> >> >> >> > My checkers eval had only material in it, so I don't know if UCT
> >> >> >> > is bringing
> >> >> >> > strategy (distant information) to the game
> >> >> >> > which the other one don't have.The games are not really played
> out
> >> >> >> > to
> >> >> >> > the
> >> >> >> > end rather to a MAX_PLY = 96
> >> >> >> > afte which the material is counted and a WDL score is assigned
> (I
> >> >> >> > call
> >> >> >> > it
> >> >> >> > partial playout).
> >> >> >> > Also the fact that captures are forced seem to help a lot
> because
> >> >> >> > it
> >> >> >> > doesn't
> >> >> >> > make too many mistakes.
> >> >> >> > I also found out some positions where it encounters similar
> >> >> >> > problems as
> >> >> >> > ladders in go. But in the checkers case,
> >> >> >> > this problems are still solved correctly. Only problem is that
> it
> >> >> >> > doesn't
> >> >> >> > report correct looking winning rates.
> >> >> >> > For example, in a position with two kings where one of the kings
> >> >> >> > is
> >> >> >> > chasing
> >> >> >> > the other to the sides to mate it, but
> >> >> >> > the loosing king can draw by making a serious of correct moves
> to
> >> >> >> > get
> >> >> >> > itself
> >> >> >> > to one of the safe corners; The program
> >> >> >> > displays winning rates of 0.01 (when it should have been more
> like
> >> >> >> > 0.5)
> >> >> >> > but
> >> >> >> > it still manages the draw !
> >> >> >> > thanks and apologies for the verbose email
> >> >> >> > Daniel
> >> >> >> > _______________________________________________
> >> >> >> > Computer-go mailing list
> >> >> >> > [email protected]
> >> >> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >> >> >
> >> >> >> _______________________________________________
> >> >> >> Computer-go mailing list
> >> >> >> [email protected]
> >> >> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > Computer-go mailing list
> >> >> > [email protected]
> >> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >> >
> >> >> _______________________________________________
> >> >> Computer-go mailing list
> >> >> [email protected]
> >> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >>
> >> >> _______________________________________________
> >> >> Computer-go mailing list
> >> >> [email protected]
> >> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >
> >> > _______________________________________________
> >> > Computer-go mailing list
> >> > [email protected]
> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] UCT parameters and application to other games

Reply via email to