Re: [Computer-go] UCT parameters and application to other games

Daniel Shawul Sat, 26 Mar 2011 08:49:39 -0700

I am using 400 now and I almost never get intermediate cuts i.e plays out to
the end.
But still the shallow tactical search depth (depth = 2) is not helping it.
Is that really what I should expect to get at the start position? I mean no
matter what heavy
playout scheme I use if it can't get deeper , I don't see how it is going to
beat it.
Please I need an answer on this. I still have to add code to avoid filling
eyes and bias
the playouts.


Here is log file for the start position. pps = the playouts per second,
visits = total playouts.
nodes = total nodes in the dynamic tree. Do you see anything suspicious
about it ?

[st = 2813ms, mt = 59250ms , moves_left 40]
Tree : nodes 59341 depth 2 pps 9747 visits 27419
@@@@   0.41      58      143
g5   0.42      76      181
h5   0.45     146      325
i5   0.36      26       72
a6   0.44     122      275
b6   0.46     164      361
c6   0.48     471      972
d6   0.47     283      601
e6   0.48     430      891
f6   0.47     292      618
g6   0.49     584     1195
h6   0.47     289      612
i6   0.40      53      131
a7   0.43      81      191
b7   0.35      21       62
c7   0.48     328      689
d7   0.46     162      356
e7   0.48     390      812
f7   0.41      57      139
g7   0.46     177      388
h7   0.36      25       70
i7   0.40      43      110
a8   0.35      22       63
b8   0.36      25       70
c8   0.49     528     1085
d8   0.00       0        8
e8   0.47     306      646
f8   0.39      44      112
g8   0.44     116      263
h8   0.44     106      242
i8   0.42      75      178
a9   0.36      25       70
b9   0.38      31       84
c9   0.14       2       14
d9   0.39      39      102
e9   0.40      50      124
f9   0.38      32       85
g9   0.42      79      186
h9   0.39      43      109
i9   0.21       4       21
e5   0.48     412      856
d5   0.46     172      378
c5   0.49     708     1438    -> Best move played   winning rate = 0.49
visits = 1438 wins = 708
b5   0.38      32       86
a5   0.46     188      410
i4   0.38      32       85
h4   0.31      13       43
g4   0.48     444      919
f4   0.48     452      936
e4   0.49     642     1309
d4   0.48     396      824
c4   0.34      19       57
b4   0.46     163      358
a4   0.46     188      409
i3   0.46     207      447
h3   0.29      11       39
g3   0.43      86      200
f3   0.41      56      138
e3   0.47     280      595
d3   0.47     283      600
c3   0.46     188      409
b3   0.47     231      497
a3   0.46     167      367
i2   0.37      28       76
h2   0.44     104      239
g2   0.45     155      342
f2   0.39      39      102
e2   0.27       9       33
d2   0.25       6       26
c2   0.43      89      207
b2   0.43      94      217
a2   0.43      81      191
i1   0.41      54      133
h1   0.14       2       14
g1   0.42      63      153
f1   0.35      21       62
e1   0.46     187      407
d1   0.48     314      662
c1   0.46     174      381
b1   0.36      25       70
a1   0.32      15       48



On Sat, Mar 26, 2011 at 11:33 AM, Erik van der Werf <
[email protected]> wrote:

> I agree, if you use a hard limit it should be much higher (probably
> something like twice the board surface is ok).
>
> 110 moves is just an observation of the average playout length for the
> empty 9x9 board. With smarter playouts that average tends to become
> lower, but the distribution may still have a long tail.
>
> Erik
>
>
> On Sat, Mar 26, 2011 at 3:36 PM, Rémi Coulom <[email protected]> wrote:
> > I'd recommend more than 110. Maybe 200 is better. In Crazy Stone, I use
> no limit, and test for superko.
> >
> > Rémi
> >
> > On 26 mars 2011, at 15:32, Daniel Shawul wrote:
> >
> >> Sorry 81 moves was a bad estimate by me. I am actually using 96 moves. I
> will change that to 110 or above
> >> moves and see what effect it has. Also I would take Remi's suggestion
> i.e to bias the move selection process.
> >> For the alpha-beta program , I have a decent move ordering algorithm and
> qsearch. I guess can borrow some from that.
> >>
> >> In the meantime, I found a paper using UCT for chinese checkers and
> other games
> http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F~sturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja<http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F%7Esturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja>,
> and also
> >> some fun java programs using UCT for checkers. It seems UCT is indeed
> competitive in checkers.
> >> I must say I didn't expect that at all. I think the forced nature of
> captures helps to improve tactical awareness of the MC simulations.
> >> Is that so ?
> >>
> >>
> >> On Sat, Mar 26, 2011 at 8:52 AM, Erik van der Werf <
> [email protected]> wrote:
> >> Ah ok, I misunderstood.
> >>
> >> Still something seems to be wrong. On the empty 9x9 board I think most
> >> programs with random/light playouts play in the order of 110 moves.
> >> ~81 moves seems quite low; in my experience you can only get such low
> >> numbers to work well if you have a lot of knowledge in your playouts.
> >> Did you check the quality of the evaluations/playouts?
> >>
> >> If you want UCT to search deeper you need good priors and perhaps
> >> something like rave/amaf.
> >>
> >> Best,
> >> Erik
> >>
> >>
> >> On Sat, Mar 26, 2011 at 1:13 PM, Daniel Shawul <[email protected]>
> wrote:
> >> > Hello,
> >> > I am using monte carlo playouts for the UCT method. It can do about
> 10k/sec.
> >> > The UCT tree is expanded to a depth of  d = 3 in a 5 sec search, from
> then
> >> > onwards a random playout (with no bias)
> >> > is carried out.  Actually it is a 'patial playout' which doesn't go to
> the
> >> > end of the game, rather upto a depth of MAX_PLY=96.
> >> >  If the game has ended earlier, then a win/draw/loss is returned.
> Otherwise
> >> > I  forcefully end the game by using a determinstic eval
> >> > and assign a WDL. For 9x9 go actually most of random playouts end
> before
> >> > move 81.
> >> > For the alpha-beta searcher , I do classical evaluation. With heavy
> use of
> >> > reductions
> >> > I can get a depth of 14 half plies , which seems to give it quite an
> edge
> >> > against the UCT version.
> >> > Is the depth of expansion for the UCT tree too low ? (d = 3 in a 5 sec
> >> > search). Should I lower the UCTK parameter
> >> > to 0.1 or so which seems to give me a depth = 7 at the start positon
> of a
> >> > 9x9 go. I am confident my implementation is
> >> > correct because it is working quite well in my checkers program
> despite my
> >> > expectation.
> >> > thanks
> >> > Daniel
> >> >
> >> > On Sat, Mar 26, 2011 at 7:54 AM, Erik van der Werf
> >> > <[email protected]> wrote:
> >> >>
> >> >> It sounds like you're using a classical (deterministic) evaluation
> >> >> function.
> >> >> Try combining UCT with Monte Carlo evaluation.
> >> >>
> >> >> Erik
> >> >>
> >> >>
> >> >> On Sat, Mar 26, 2011 at 12:43 PM, Daniel Shawul <[email protected]>
> wrote:
> >> >> > Hello,
> >> >> > I am very new to UCT,  just implemented basic UCT for go yesterday.
> >> >> > But with no success so far for GO,I think  mostly because it
> searches
> >> >> > not
> >> >> > very deep (depth = 3 on a 5 sec search with those values).
> >> >> > I am using the following values as UCT parameters
> >> >> > UCTK = sqrt(1/5) = 0.44     UCTN = 10 (visits afte which best move
> is
> >> >> > expanded)
> >> >> > Even if I lower UCTK down to 7 I get a maximum depth of d=7 at the
> start
> >> >> > position for a 5 sec search.
> >> >> > For how deep a search should I tune these parameter for ?
> >> >> > Before UCT,  I have an alpha-beta searcher which sometimes plays on
> >> >> > CGOS.
> >> >> > It reached a level of ~1500, and this engine seems to be too strong
> for
> >> >> > the
> >> >> > UCT version.
> >> >> >  It just gets outsearched in some tactical positions and also in
> >> >> > evaluation
> >> >> > I think.
> >> >> > For example, I have an evaluation term which gives big bonuses for
> >> >> > connected
> >> >> > strings which seems
> >> >> > to give an edge in a lot of games.. How do you introduce such eval
> terms
> >> >> > in
> >> >> > UCT ?
> >> >> > But for my checkers program , to my big surprise , UCT made a
> >> >> > significant
> >> >> > impact. The regular
> >> >> > alpha-beta searcher averages a depth=25 but the UCT version I think
> is
> >> >> > equally strong from the games
> >> >> > I saw. That was a kind of surprise for me because I thought UCT
> would
> >> >> > work
> >> >> > better for bushy trees and
> >> >> > when the eval has a lot of strategy. It also reached good depths
> >> >> > averaging
> >> >> > 16 plies .
> >> >> > My checkers eval had only material in it, so I don't know if UCT
> >> >> > is bringing
> >> >> > strategy (distant information) to the game
> >> >> > which the other one don't have.The games are not really played out
> to
> >> >> > the
> >> >> > end rather to a MAX_PLY = 96
> >> >> > afte which the material is counted and a WDL score is assigned (I
> call
> >> >> > it
> >> >> > partial playout).
> >> >> > Also the fact that captures are forced seem to help a lot because
> it
> >> >> > doesn't
> >> >> > make too many mistakes.
> >> >> > I also found out some positions where it encounters similar
> problems as
> >> >> > ladders in go. But in the checkers case,
> >> >> > this problems are still solved correctly. Only problem is that it
> >> >> > doesn't
> >> >> > report correct looking winning rates.
> >> >> > For example, in a position with two kings where one of the kings is
> >> >> > chasing
> >> >> > the other to the sides to mate it, but
> >> >> > the loosing king can draw by making a serious of correct moves to
> get
> >> >> > itself
> >> >> > to one of the safe corners; The program
> >> >> > displays winning rates of 0.01 (when it should have been more like
> 0.5)
> >> >> > but
> >> >> > it still manages the draw !
> >> >> > thanks and apologies for the verbose email
> >> >> > Daniel
> >> >> > _______________________________________________
> >> >> > Computer-go mailing list
> >> >> > [email protected]
> >> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >> >
> >> >> _______________________________________________
> >> >> Computer-go mailing list
> >> >> [email protected]
> >> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >
> >> >
> >> > _______________________________________________
> >> > Computer-go mailing list
> >> > [email protected]
> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >> >
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >>
> >> _______________________________________________
> >> Computer-go mailing list
> >> [email protected]
> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> > _______________________________________________
> > Computer-go mailing list
> > [email protected]
> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
> >
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>

_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] UCT parameters and application to other games

Reply via email to