On 2011-06-21 02:53, Álvaro Begué wrote: > I think when you have explored a node enough times, there is no point > in considering the score of the node to be the average of all the > tries, but it should really be just the score of the best child (i.e., > the minimax rule). UCT does converge to that value, but it does so by > reducing exploration of inferior moves, which results in the > long-lines behavior I just described. > > Perhaps there is a role for alpha-beta near the root of the tree when > we have enough CPU, and that might scale better.
One of my favourite subjects. I noticed, about 3-4 years ago now, from my sm9 (human-computer team 9x9 go) experiments that an MCTS program would usually have chosen a strong move after say 50,000 playouts, but when it was wrong it typically would still be wrong after a million playouts. (Very subjective, sorry Don ;-) Hence the proposal to use alpha-beta as the top-level search, using MCTS with about 50K playouts at the nodes. I've done a few experiments in this direction, and I still think it is very promising. Technically the current state of sm9 automation is minimax on top of 4 MCTS and one traditional go program. (But very few nodes in the minimax tree as I give each program a few minutes of CPU time for every move.) Darren -- Darren Cook, Software Researcher/Developer http://dcook.org/work/ (About me and my work) http://dcook.org/blogs.html (My blogs and articles) _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
