On Fri, Jun 24, 2011 at 8:27 AM, Don Dailey <[email protected]> wrote: > > On Fri, Jun 24, 2011 at 1:40 AM, Darren Cook <[email protected]> wrote: >> >> On 2011-06-21 02:53, Álvaro Begué wrote: >> > I think when you have explored a node enough times, there is no point >> > in considering the score of the node to be the average of all the >> > tries, but it should really be just the score of the best child (i.e., >> > the minimax rule). UCT does converge to that value, but it does so by >> > reducing exploration of inferior moves, which results in the >> > long-lines behavior I just described. >> > >> > Perhaps there is a role for alpha-beta near the root of the tree when >> > we have enough CPU, and that might scale better. > > Referring to what Álvaro Begué wrote, I think the idea is sound, MCTS > really > is mini-max and considering the score of the other nodes is just a noise > reduction technique which is not strictly necessary. When there are few > samples it's surely a benefit but many not when there are many. Perhaps > the influence of sibling scores should be gradually removed? I guess in > practice that is what happens when one move is so popular others rarely > get played.
You see the problem with that last sentence, don't you? I don't know much about go, so perhaps I should use a queen sacrifice in chess as an example... No, I'll try with go anyway. Imagine a situation where my opponent just attacked a large group of mine. It turns out the move doesn't need a response (gote), but it is not at all obvious that it doesn't need a response. A good tenuki will not be explored much by MCTS, because initially it looks like its score is much worse than that of a more defensive response which clearly saves the group, and therefore it would take MCTS an impractical amount of time to change its mind. At the root we could just decide to be more exploratory and give apparently weak moves more of a chance. But if we do this at nodes that are not the root, all the exploration will pollute the score that we return. In minimax, we give every move a similar chance*, but then we don't spend much time searching obviously bad moves because alpha-beta does a great job of proving they are inferior to the best move we've found. If a move returns a very bad score (queen sacrifice), we may still spend a lot of time on it, if proving that it's a bad move is hard. And the important feature missing in MCTS is that the score we return doesn't suffer because we spent time on that move that turned out to be bad. I believe this is the main reason programs are not good at reading semeais. Any situation in which a move's score might jump enormously when we discover the correct follow-up moves will probably give MCTS trouble. Of course fixing this will do nothing to help in situations where the playouts misread the status of some group. That's more like having a bad evaluation function with mistakes that persist for many moves. (*) - To be completely correct, bad moves should get penalized in depth something proportional to -log(probability of being the right move), and some techniques like LMR (late move reductions) approximate that, but as the search depth increases they will get explored deeper and deeper anyway, and eventually (in a practical amount of time) minimax will see their merit. > >> >> One of my favourite subjects. I noticed, about 3-4 years ago now, from >> my sm9 (human-computer team 9x9 go) experiments that an MCTS program >> would usually have chosen a strong move after say 50,000 playouts, but >> when it was wrong it typically would still be wrong after a million >> playouts. (Very subjective, sorry Don ;-) > > I have no problem with empirical and subject observations when it's used to > for > hypothesis building. But once you view something as a fact then you need > to > be correct because now new ideas are built upon it and then you could have > a > mess! For example once you accept as fact that the earth is the center > of > the universe you cannot make further progress in understanding the universe. > > >> >> Hence the proposal to use alpha-beta as the top-level search, using MCTS >> with about 50K playouts at the nodes. I've done a few experiments in >> this direction, and I still think it is very promising. Technically the >> current state of sm9 automation is minimax on top of 4 MCTS and one >> traditional go program. (But very few nodes in the minimax tree as I >> give each program a few minutes of CPU time for every move.) >> >> Darren >> >> >> -- >> Darren Cook, Software Researcher/Developer >> >> http://dcook.org/work/ (About me and my work) >> http://dcook.org/blogs.html (My blogs and articles) >> _______________________________________________ >> Computer-go mailing list >> [email protected] >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
