Re: [Computer-go] UCT parameters and application to other games

Erik van der Werf Sat, 26 Mar 2011 08:34:35 -0700

I agree, if you use a hard limit it should be much higher (probably
something like twice the board surface is ok).


110 moves is just an observation of the average playout length for the
empty 9x9 board. With smarter playouts that average tends to become
lower, but the distribution may still have a long tail.

Erik


On Sat, Mar 26, 2011 at 3:36 PM, Rémi Coulom <[email protected]> wrote:
> I'd recommend more than 110. Maybe 200 is better. In Crazy Stone, I use no 
> limit, and test for superko.
>
> Rémi
>
> On 26 mars 2011, at 15:32, Daniel Shawul wrote:
>
>> Sorry 81 moves was a bad estimate by me. I am actually using 96 moves. I 
>> will change that to 110 or above
>> moves and see what effect it has. Also I would take Remi's suggestion i.e to 
>> bias the move selection process.
>> For the alpha-beta program , I have a decent move ordering algorithm and 
>> qsearch. I guess can borrow some from that.
>>
>> In the meantime, I found a paper using UCT for chinese checkers and other 
>> games 
>> http://www.google.com/url?sa=t&source=web&cd=7&ved=0CD4QFjAG&url=http%3A%2F%2Fweb.cs.du.edu%2F~sturtevant%2Fpapers%2Fmpuct_icga.pdf&rct=j&q=UCT%20for%20checkers&ei=mPiNTcqdBsSO0QG20-nWCw&usg=AFQjCNGFgMMMG8xMtawvx-3rQtwXPhfWxQ&cad=rja,
>>  and also
>> some fun java programs using UCT for checkers. It seems UCT is indeed 
>> competitive in checkers.
>> I must say I didn't expect that at all. I think the forced nature of 
>> captures helps to improve tactical awareness of the MC simulations.
>> Is that so ?
>>
>>
>> On Sat, Mar 26, 2011 at 8:52 AM, Erik van der Werf 
>> <[email protected]> wrote:
>> Ah ok, I misunderstood.
>>
>> Still something seems to be wrong. On the empty 9x9 board I think most
>> programs with random/light playouts play in the order of 110 moves.
>> ~81 moves seems quite low; in my experience you can only get such low
>> numbers to work well if you have a lot of knowledge in your playouts.
>> Did you check the quality of the evaluations/playouts?
>>
>> If you want UCT to search deeper you need good priors and perhaps
>> something like rave/amaf.
>>
>> Best,
>> Erik
>>
>>
>> On Sat, Mar 26, 2011 at 1:13 PM, Daniel Shawul <[email protected]> wrote:
>> > Hello,
>> > I am using monte carlo playouts for the UCT method. It can do about 
>> > 10k/sec.
>> > The UCT tree is expanded to a depth of  d = 3 in a 5 sec search, from then
>> > onwards a random playout (with no bias)
>> > is carried out.  Actually it is a 'patial playout' which doesn't go to the
>> > end of the game, rather upto a depth of MAX_PLY=96.
>> >  If the game has ended earlier, then a win/draw/loss is returned. Otherwise
>> > I  forcefully end the game by using a determinstic eval
>> > and assign a WDL. For 9x9 go actually most of random playouts end before
>> > move 81.
>> > For the alpha-beta searcher , I do classical evaluation. With heavy use of
>> > reductions
>> > I can get a depth of 14 half plies , which seems to give it quite an edge
>> > against the UCT version.
>> > Is the depth of expansion for the UCT tree too low ? (d = 3 in a 5 sec
>> > search). Should I lower the UCTK parameter
>> > to 0.1 or so which seems to give me a depth = 7 at the start positon of a
>> > 9x9 go. I am confident my implementation is
>> > correct because it is working quite well in my checkers program despite my
>> > expectation.
>> > thanks
>> > Daniel
>> >
>> > On Sat, Mar 26, 2011 at 7:54 AM, Erik van der Werf
>> > <[email protected]> wrote:
>> >>
>> >> It sounds like you're using a classical (deterministic) evaluation
>> >> function.
>> >> Try combining UCT with Monte Carlo evaluation.
>> >>
>> >> Erik
>> >>
>> >>
>> >> On Sat, Mar 26, 2011 at 12:43 PM, Daniel Shawul <[email protected]> wrote:
>> >> > Hello,
>> >> > I am very new to UCT,  just implemented basic UCT for go yesterday.
>> >> > But with no success so far for GO,I think  mostly because it searches
>> >> > not
>> >> > very deep (depth = 3 on a 5 sec search with those values).
>> >> > I am using the following values as UCT parameters
>> >> > UCTK = sqrt(1/5) = 0.44     UCTN = 10 (visits afte which best move is
>> >> > expanded)
>> >> > Even if I lower UCTK down to 7 I get a maximum depth of d=7 at the start
>> >> > position for a 5 sec search.
>> >> > For how deep a search should I tune these parameter for ?
>> >> > Before UCT,  I have an alpha-beta searcher which sometimes plays on
>> >> > CGOS.
>> >> > It reached a level of ~1500, and this engine seems to be too strong for
>> >> > the
>> >> > UCT version.
>> >> >  It just gets outsearched in some tactical positions and also in
>> >> > evaluation
>> >> > I think.
>> >> > For example, I have an evaluation term which gives big bonuses for
>> >> > connected
>> >> > strings which seems
>> >> > to give an edge in a lot of games.. How do you introduce such eval terms
>> >> > in
>> >> > UCT ?
>> >> > But for my checkers program , to my big surprise , UCT made a
>> >> > significant
>> >> > impact. The regular
>> >> > alpha-beta searcher averages a depth=25 but the UCT version I think is
>> >> > equally strong from the games
>> >> > I saw. That was a kind of surprise for me because I thought UCT would
>> >> > work
>> >> > better for bushy trees and
>> >> > when the eval has a lot of strategy. It also reached good depths
>> >> > averaging
>> >> > 16 plies .
>> >> > My checkers eval had only material in it, so I don't know if UCT
>> >> > is bringing
>> >> > strategy (distant information) to the game
>> >> > which the other one don't have.The games are not really played out to
>> >> > the
>> >> > end rather to a MAX_PLY = 96
>> >> > afte which the material is counted and a WDL score is assigned (I call
>> >> > it
>> >> > partial playout).
>> >> > Also the fact that captures are forced seem to help a lot because it
>> >> > doesn't
>> >> > make too many mistakes.
>> >> > I also found out some positions where it encounters similar problems as
>> >> > ladders in go. But in the checkers case,
>> >> > this problems are still solved correctly. Only problem is that it
>> >> > doesn't
>> >> > report correct looking winning rates.
>> >> > For example, in a position with two kings where one of the kings is
>> >> > chasing
>> >> > the other to the sides to mate it, but
>> >> > the loosing king can draw by making a serious of correct moves to get
>> >> > itself
>> >> > to one of the safe corners; The program
>> >> > displays winning rates of 0.01 (when it should have been more like 0.5)
>> >> > but
>> >> > it still manages the draw !
>> >> > thanks and apologies for the verbose email
>> >> > Daniel
>> >> > _______________________________________________
>> >> > Computer-go mailing list
>> >> > [email protected]
>> >> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >> >
>> >> _______________________________________________
>> >> Computer-go mailing list
>> >> [email protected]
>> >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >
>> >
>> > _______________________________________________
>> > Computer-go mailing list
>> > [email protected]
>> > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>> >
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>>
>> _______________________________________________
>> Computer-go mailing list
>> [email protected]
>> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
> _______________________________________________
> Computer-go mailing list
> [email protected]
> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
>
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Re: [Computer-go] UCT parameters and application to other games

Reply via email to