On Fri, Jun 24, 2011 at 1:40 AM, Darren Cook <[email protected]> wrote:
> On 2011-06-21 02:53, Álvaro Begué wrote: > > I think when you have explored a node enough times, there is no point > > in considering the score of the node to be the average of all the > > tries, but it should really be just the score of the best child (i.e., > > the minimax rule). UCT does converge to that value, but it does so by > > reducing exploration of inferior moves, which results in the > > long-lines behavior I just described. > > > > Perhaps there is a role for alpha-beta near the root of the tree when > > we have enough CPU, and that might scale better. > Referring to what Álvaro Begué wrote, I think the idea is sound, MCTS really is mini-max and considering the score of the other nodes is just a noise reduction technique which is not strictly necessary. When there are few samples it's surely a benefit but many not when there are many. Perhaps the influence of sibling scores should be gradually removed? I guess in practice that is what happens when one move is so popular others rarely get played. > > One of my favourite subjects. I noticed, about 3-4 years ago now, from > my sm9 (human-computer team 9x9 go) experiments that an MCTS program > would usually have chosen a strong move after say 50,000 playouts, but > when it was wrong it typically would still be wrong after a million > playouts. (Very subjective, sorry Don ;-) > I have no problem with empirical and subject observations when it's used to for hypothesis building. But once you view something as a fact then you need to be correct because now new ideas are built upon it and then you could have a mess! For example once you accept as fact that the earth is the center of the universe you cannot make further progress in understanding the universe. > > Hence the proposal to use alpha-beta as the top-level search, using MCTS > with about 50K playouts at the nodes. I've done a few experiments in > this direction, and I still think it is very promising. Technically the > current state of sm9 automation is minimax on top of 4 MCTS and one > traditional go program. (But very few nodes in the minimax tree as I > give each program a few minutes of CPU time for every move.) > > Darren > > > -- > Darren Cook, Software Researcher/Developer > > http://dcook.org/work/ (About me and my work) > http://dcook.org/blogs.html (My blogs and articles) > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
