Re: [Computer-go] UCT parameters and application to other games

Michael Williams Sat, 26 Mar 2011 09:23:59 -0700

You allow eye-filling in the playouts?  Or anywhere?  It should never
be allowed.  Top priority.



On Sat, Mar 26, 2011 at 11:49 AM, Daniel Shawul <[email protected]> wrote:
> I am using 400 now and I almost never get intermediate cuts i.e plays out to
> the end.
> But still the shallow tactical search depth (depth = 2) is not helping it.
> Is that really what I should expect to get at the start position? I mean no
> matter what heavy
> playout scheme I use if it can't get deeper , I don't see how it is going to
> beat it.
> Please I need an answer on this. I still have to add code to avoid filling
> eyes and bias
> the playouts.
>
> Here is log file for the start position. pps = the playouts per second,
> visits = total playouts.
> nodes = total nodes in the dynamic tree. Do you see anything suspicious
> about it ?
>
> [st = 2813ms, mt = 59250ms , moves_left 40]
> Tree : nodes 59341 depth 2 pps 9747 visits 27419
> @@@@   0.41      58      143
> g5   0.42      76      181
> h5   0.45     146      325
> i5   0.36      26       72
> a6   0.44     122      275
> b6   0.46     164      361
> c6   0.48     471      972
> d6   0.47     283      601
> e6   0.48     430      891
> f6   0.47     292      618
> g6   0.49     584     1195
> h6   0.47     289      612
> i6   0.40      53      131
> a7   0.43      81      191
> b7   0.35      21       62
> c7   0.48     328      689
> d7   0.46     162      356
> e7   0.48     390      812
> f7   0.41      57      139
> g7   0.46     177      388
> h7   0.36      25       70
> i7   0.40      43      110
> a8   0.35      22       63
> b8   0.36      25       70
> c8   0.49     528     1085
> d8   0.00       0        8
> e8   0.47     306      646
> f8   0.39      44      112
> g8   0.44     116      263
> h8   0.44     106      242
> i8   0.42      75      178
> a9   0.36      25       70
> b9   0.38      31       84
> c9   0.14       2       14
> d9   0.39      39      102
> e9   0.40      50      124
> f9   0.38      32       85
> g9   0.42      79      186
> h9   0.39      43      109
> i9   0.21       4       21
> e5   0.48     412      856
> d5   0.46     172      378
> c5   0.49     708     1438    -> Best move played   winning rate = 0.49
> visits = 1438 wins = 708
> b5   0.38      32       86
> a5   0.46     188      410
> i4   0.38      32       85
> h4   0.31      13       43
> g4   0.48     444      919
> f4   0.48     452      936
> e4   0.49     642     1309
> d4   0.48     396      824
> c4   0.34      19       57
> b4   0.46     163      358
> a4   0.46     188      409
> i3   0.46     207      447
> h3   0.29      11       39
> g3   0.43      86      200
> f3   0.41      56      138
> e3   0.47     280      595
> d3   0.47     283      600
> c3   0.46     188      409
> b3   0.47     231      497
> a3   0.46     167      367
> i2   0.37      28       76
> h2   0.44     104      239
> g2   0.45     155      342
> f2   0.39      39      102
> e2   0.27       9       33
> d2   0.25       6       26
> c2   0.43      89      207
> b2   0.43      94      217
> a2   0.43      81      191
> i1   0.41      54      133
> h1   0.14       2       14
> g1   0.42      63      153
> f1   0.35      21       62
> e1   0.46     187      407
> d1   0.48     314      662
> c1   0.46     174      381
> b1   0.36      25       70
> a1   0.32      15       48
>
>
>
> On Sat, Mar 26, 2011 at 11:33 AM, Erik van der Werf
> <[email protected]> wrote:
>>
>> I agree, if you use a hard limit it should be much higher (probably
>> something like twice the board surface is ok).
>>
>> 110 moves is just an observation of the average playout length for the
>> empty 9x9 board. With smarter playouts that average tends to become
>> lower, but the distribution may still have a long tail.
>>
>> Erik
>>
>>
>> On Sat, Mar 26, 2011 at 3:36 PM, Rémi Coulom <[email protected]> wrote:
>> > I'd recommend more than 110. Maybe 200 is better. In Crazy Stone, I use
>> > no limit, and test for superko.
>> >
>> > Rémi
>> >
>> > On 26 mars 2011, at 15:32, Daniel Shawul wrote:
>> >
>> >> Sorry 81 moves was a bad estimate by me. I am actually using 96 moves.
>> >> I will change that to 110 or above
>> >> moves and see what effect it has. Also I would take Remi's suggestion
>> >> i.e to bias the move selection process.
>> >> For the alpha-beta program , I have a decent move ordering algorithm
>> >> and qsearch. I guess can borrow some from that.
>> >>
>> >> In the meantime, I found a paper using UCT for chinese checkers and
>> >> other games
>> >> http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F~sturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja,
>> >> and also
>> >> some fun java programs using UCT for checkers. It seems UCT is indeed
>> >> competitive in checkers.
>> >> I must say I didn't expect that at all. I think the forced nature of
>> >> captures helps to improve tactical awareness of the MC simulations.
>> >> Is that so ?
>> >>
>> >>
>> >> On Sat, Mar 26, 2011 at 8:52 AM, Erik van der Werf
>> >> <[email protected]> wrote:
>> >> Ah ok, I misunderstood.
>> >>
>> >> Still something seems to be wrong. On the empty 9x9 board I think most
>> >> programs with random/light playouts play in the order of 110 moves.
>> >> ~81 moves seems quite low; in my experience you can only get such low
>> >> numbers to work well if you have a lot of knowledge in your playouts.
>> >> Did you check the quality of the evaluations/playouts?
>> >>
>> >> If you want UCT to search deeper you need good priors and perhaps
>> >> something like rave/amaf.
>> >>
>> >> Best,
>> >> Erik
>> >>
>> >>
>> >> On Sat, Mar 26, 2011 at 1:13 PM, Daniel Shawul <[email protected]>
>> >> wrote:
>> >> > Hello,
>> >> > I am using monte carlo playouts for the UCT method. It can do about
>> >> > 10k/sec.
>> >> > The UCT tree is expanded to a depth of  d = 3 in a 5 sec search, from
>> >> > then
>> >> > onwards a random playout (with no bias)
>> >> > is carried out.  Actually it is a 'patial playout' which doesn't go
>> >> > to the
>> >> > end of the game, rather upto a depth of MAX_PLY=96.
>> >> >  If the game has ended earlier, then a win/draw/loss is returned.
>> >> > Otherwise
>> >> > I  forcefully end the game by using a determinstic eval
>> >> > and assign a WDL. For 9x9 go actually most of random playouts end
>> >> > before
>> >> > move 81.
>> >> > For the alpha-beta searcher , I do classical evaluation. With heavy
>> >> > use of
>> >> > reductions
>> >> > I can get a depth of 14 half plies , which seems to give it quite an
>> >> > edge
>> >> > against the UCT version.
>> >> > Is the depth of expansion for the UCT tree too low ? (d = 3 in a 5
>> >> > sec
>> >> > search). Should I lower the UCTK parameter
>> >> > to 0.1 or so which seems to give me a depth = 7 at the start positon
>> >> > of a
>> >> > 9x9 go. I am confident my implementation is
>> >> > correct because it is working quite well in my checkers program
>> >> > despite my
>> >> > expectation.
>> >> > thanks
>> >> > Daniel
>> >> >
>> >> > On Sat, Mar 26, 2011 at 7:54 AM, Erik van der Werf
>> >> > <[email protected]> wrote:
>> >> >>
>> >> >> It sounds like you're using a classical (deterministic) evaluation
>> >> >> function.
>> >> >> Try combining UCT with Monte Carlo evaluation.
>> >> >>
>> >> >> Erik
>> >> >>
>> >> >>
>> >> >> On Sat, Mar 26, 2011 at 12:43 PM, Daniel Shawul <[email protected]>
>> >> >> wrote:
>> >> >> > Hello,
>> >> >> > I am very new to UCT,  just implemented basic UCT for go
>> >> >> > yesterday.
>> >> >> > But with no success so far for GO,I think  mostly because it
>> >> >> > searches
>> >> >> > not
>> >> >> > very deep (depth = 3 on a 5 sec search with those values).
>> >> >> > I am using the following values as UCT parameters
>> >> >> > UCTK = sqrt(1/5) = 0.44     UCTN = 10 (visits afte which best move
>> >> >> > is
>> >> >> > expanded)
>> >> >> > Even if I lower UCTK down to 7 I get a maximum depth of d=7 at the
>> >> >> > start
>> >> >> > position for a 5 sec search.
>> >> >> > For how deep a search should I tune these parameter for ?
>> >> >> > Before UCT,  I have an alpha-beta searcher which sometimes plays
>> >> >> > on
>> >> >> > CGOS.
>> >> >> > It reached a level of ~1500, and this engine seems to be too
>> >> >> > strong for
>> >> >> > the
>> >> >> > UCT version.
>> >> >> >  It just gets outsearched in some tactical positions and also in
>> >> >> > evaluation
>> >> >> > I think.
>> >> >> > For example, I have an evaluation term which gives big bonuses for
>> >> >> > connected
>> >> >> > strings which seems
>> >> >> > to give an edge in a lot of games.. How do you introduce such eval
>> >> >> > terms
>> >> >> > in
>> >> >> > UCT ?
>> >> >> > But for my checkers program , to my big surprise , UCT made a
>> >> >> > significant
>> >> >> > impact. The regular
>> >> >> > alpha-beta searcher averages a depth=25 but the UCT version I
>> >> >> > think is
>> >> >> > equally strong from the games
>> >> >> > I saw. That was a kind of surprise for me because I thought UCT
>> >> >> > would
>> >> >> > work
>> >> >> > better for bushy trees and
>> >> >> > when the eval has a lot of strategy. It also reached good depths
>> >> >> > averaging
>> >> >> > 16 plies .
>> >> >> > My checkers eval had only material in it, so I don't know if UCT
>> >> >> > is bringing
>> >> >> > strategy (distant information) to the game
>> >> >> > which the other one don't have.The games are not really played out
>> >> >> > to
>> >> >> > the
>> >> >> > end rather to a MAX_PLY = 96
>> >> >> > afte which the material is counted and a WDL score is assigned (I
>> >> >> > call
>> >> >> > it
>> >> >> > partial playout).
>> >> >> > Also the fact that captures are forced seem to help a lot because
>> >> >> > it
>> >> >> > doesn't
>> >> >> > make too many mistakes.
>> >> >> > I also found out some positions where it encounters similar
>> >> >> > problems as
>> >> >> > ladders in go. But in the checkers case,
>> >> >> > this problems are still solved correctly. Only problem is that it
>> >> >> > doesn't
>> >> >> > report correct looking winning rates.
>> >> >> > For example, in a position with two kings where one of the kings
>> >> >> > is
>> >> >> > chasing
>> >> >> > the other to the sides to mate it, but
>> >> >> > the loosing king can draw by making a serious of correct moves to
>> >> >> > get
>> >> >> > itself
>> >> >> > to one of the safe corners; The program
>> >> >> > displays winning rates of 0.01 (when it should have been more like
>> >> >> > 0.5)
>> >> >> > but
>> >> >> > it still manages the draw !
>> >> >> > thanks and apologies for the verbose email
>> >> >> > Daniel
>> >> >> > _______________________________________________
>> >> >> > Computer-go mailing list
>> >> >> > [email protected]
>> >> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >> >> >
>> >> >> _______________________________________________
>> >> >> Computer-go mailing list
>> >> >> [email protected]
>> >> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Computer-go mailing list
>> >> > [email protected]
>> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >> >
>> >> _______________________________________________
>> >> Computer-go mailing list
>> >> [email protected]
>> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >>
>> >> _______________________________________________
>> >> Computer-go mailing list
>> >> [email protected]
>> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > [email protected]
>> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] UCT parameters and application to other games

Reply via email to