On 2011-06-21 02:53, Álvaro Begué wrote:
> I think when you have explored a node enough times, there is no point
> in considering the score of the node to be the average of all the
> tries, but it should really be just the score of the best child (i.e.,
> the minimax rule). UCT does converge to that value, but it does so by
> reducing exploration of inferior moves, which results in the
> long-lines behavior I just described.
> 
> Perhaps there is a role for alpha-beta near the root of the tree when
> we have enough CPU, and that might scale better.

One of my favourite subjects. I noticed, about 3-4 years ago now, from
my sm9 (human-computer team 9x9 go) experiments that an MCTS program
would usually have chosen a strong move after say 50,000 playouts, but
when it was wrong it typically would still be wrong after a million
playouts. (Very subjective, sorry Don ;-)

Hence the proposal to use alpha-beta as the top-level search, using MCTS
with about 50K playouts at the nodes. I've done a few experiments in
this direction, and I still think it is very promising. Technically the
current state of sm9 automation is minimax on top of 4 MCTS and one
traditional go program. (But very few nodes in the minimax tree as I
give each program a few minutes of CPU time for every move.)

Darren


-- 
Darren Cook, Software Researcher/Developer

http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to