On Fri, Jun 24, 2011 at 10:09 AM, Álvaro Begué <[email protected]>wrote:
> On Fri, Jun 24, 2011 at 8:27 AM, Don Dailey <[email protected]> wrote: > > > > On Fri, Jun 24, 2011 at 1:40 AM, Darren Cook <[email protected]> wrote: > >> > >> On 2011-06-21 02:53, Álvaro Begué wrote: > >> > I think when you have explored a node enough times, there is no point > >> > in considering the score of the node to be the average of all the > >> > tries, but it should really be just the score of the best child (i.e., > >> > the minimax rule). UCT does converge to that value, but it does so by > >> > reducing exploration of inferior moves, which results in the > >> > long-lines behavior I just described. > >> > > >> > Perhaps there is a role for alpha-beta near the root of the tree when > >> > we have enough CPU, and that might scale better. > > > > Referring to what Álvaro Begué wrote, I think the idea is sound, MCTS > > really > > is mini-max and considering the score of the other nodes is just a noise > > reduction technique which is not strictly necessary. When there are > few > > samples it's surely a benefit but many not when there are many. Perhaps > > the influence of sibling scores should be gradually removed? I guess > in > > practice that is what happens when one move is so popular others rarely > > get played. > > You see the problem with that last sentence, don't you? I think I understand what you are getting at. I'm not so sure alpha/beta pruning cannot work for go - modern chess programs are extremely aggressive about reducing moves that appear weak and this model is not so different than what go programs do. Essentially it works like this: 1. Play the first N moves with normal depth. N may only be 1 or 2 moves. 2. The siblings are "reduced" - if you are doing a 10 ply search you might only search the siblings 5 ply for instance. 3. If one of the siblings comes back with a score about alpha, search it again full depth. I left out a lot of detail, but this does depend on having good move ordering and in chess there are some conditions where you might not reduce a move (or reduce it less), even if it's late in the list (such as if it gives check or is a capture.) This has the potential in GO of enormously reducing the search tree. In GO it's hardly any different with MCTS, you focus on 1 moves and give only minor consideration to the rest unless you find unexpectedly that it is good after all. With a lot of work and enhancements specific to GO I don't know why this wouldn't work. Maybe it wouldn't, but I currently cannot see a good reason why it wouldn't. One problem is the slow playout-based evaluation function. It seems to be important for statistical reasons to combine the results of all the siblings when the playout count is low. But perhaps at the lower depths we do that, but at higher depths we use alpha/beta pruning with aggressive move reductions. I had a discussion about a year ago with Gian Carlo about this at the Dutch chess tournament in Leiden and we both speculated that when it's all said and done we may very well find out that conceptually there is no real difference. In chess there was once all these different kinds of searches but Aske Platt did a thesis that showed they could all be expressed inside a simple framework that showed the differences were more minor that we had imagined. Don > I don't know > much about go, so perhaps I should use a queen sacrifice in chess as > an example... No, I'll try with go anyway. Imagine a situation where > my opponent just attacked a large group of mine. It turns out the move > doesn't need a response (gote), but it is not at all obvious that it > doesn't need a response. A good tenuki will not be explored much by > MCTS, because initially it looks like its score is much worse than > that of a more defensive response which clearly saves the group, and > therefore it would take MCTS an impractical amount of time to change > its mind. At the root we could just decide to be more exploratory and > give apparently weak moves more of a chance. But if we do this at > nodes that are not the root, all the exploration will pollute the > score that we return. > > In minimax, we give every move a similar chance*, but then we don't > spend much time searching obviously bad moves because alpha-beta does > a great job of proving they are inferior to the best move we've found. > If a move returns a very bad score (queen sacrifice), we may still > spend a lot of time on it, if proving that it's a bad move is hard. > And the important feature missing in MCTS is that the score we return > doesn't suffer because we spent time on that move that turned out to > be bad. > > I believe this is the main reason programs are not good at reading > semeais. Any situation in which a move's score might jump enormously > when we discover the correct follow-up moves will probably give MCTS > trouble. Of course fixing this will do nothing to help in situations > where the playouts misread the status of some group. That's more like > having a bad evaluation function with mistakes that persist for many > moves. > > (*) - To be completely correct, bad moves should get penalized in > depth something proportional to -log(probability of being the right > move), and some techniques like LMR (late move reductions) approximate > that, but as the search depth increases they will get explored deeper > and deeper anyway, and eventually (in a practical amount of time) > minimax will see their merit. > > > > >> > >> One of my favourite subjects. I noticed, about 3-4 years ago now, from > >> my sm9 (human-computer team 9x9 go) experiments that an MCTS program > >> would usually have chosen a strong move after say 50,000 playouts, but > >> when it was wrong it typically would still be wrong after a million > >> playouts. (Very subjective, sorry Don ;-) > > > > I have no problem with empirical and subject observations when it's used > to > > for > > hypothesis building. But once you view something as a fact then you > need > > to > > be correct because now new ideas are built upon it and then you could > have > > a > > mess! For example once you accept as fact that the earth is the > center > > of > > the universe you cannot make further progress in understanding the > universe. > > > > > >> > >> Hence the proposal to use alpha-beta as the top-level search, using MCTS > >> with about 50K playouts at the nodes. I've done a few experiments in > >> this direction, and I still think it is very promising. Technically the > >> current state of sm9 automation is minimax on top of 4 MCTS and one > >> traditional go program. (But very few nodes in the minimax tree as I > >> give each program a few minutes of CPU time for every move.) > >> > >> Darren > >> > >> > >> -- > >> Darren Cook, Software Researcher/Developer > >> > >> http://dcook.org/work/ (About me and my work) > >> http://dcook.org/blogs.html (My blogs and articles) > >> _______________________________________________ > >> Computer-go mailing list > >> [email protected] > >> http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > > _______________________________________________ > > Computer-go mailing list > > [email protected] > > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go > > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
