Hi Oliver, Now I know Remi is the first to utilize MCTS. Guess I need to read papers more carefully. I do have a question though. I thought UCT is the foundation of the current strong programs, I know that a RAVE term is added to the original UCB term, i.e. sqrt(t_total/t_i), but the UCB term is still there right? Could you eleborate a bit on why do you say "UCT is not good for Go"? This is quite contradictory to a lot of material on the internet regarding the lastest bread of go programs.
Regards, Fuming On Fri, Dec 31, 2010 at 5:51 PM, Olivier Teytaud <[email protected]> wrote: > > > Dear all, > the original MCTS paper is by Rémi Coulom (to the best of my knowledge at > least...). It's clear for us that we did not invent MCTS > and always referenced Remi's paper. > > *1) For UCB-like formula:* > - On the theoretical side, the consistency proof of MCTS without the > UCT-like exploration also comes from mogo-people (Berthier et al); instead > of saying that mogo's contribution is the introduction of UCT (which is good > for other games but not for Go), I would > have said that mogo's contribution is the analysis of MCTS without UCT > (even if MCTS without UCT existed before mogo). > - I'd like to point out that UCB-formulas are, I think, good for games with > random part (e.g. with random transition, or with hidden information which > leads to randomized strategies). But not for Go :-) > > *2) For works on the Monte-Carlo part:* > For MoGo's contributions on the Monte-Carlo part, in particular with > Yizao's patterns (with other people as well). There > was a significant difference with previous attempts of designing good > Monte-Carlo parts in the sense that a good Monte-Carlo part > is not a MC-part which plays well as a standalone player, but a MC-part > which plays well with a MCTS on top of it; this is unfortunately not very > convenient as a criterion for designing a MC part... I think the main idea > was the idea of balancing - the situation should not be better for one of > the two players after one move by each player. This was claimed already in > Sylvain's thesis and (I think) earlier than that. > > *3) Other mogo's contributions (I might forget many things...)* > are around the fillboard option (which has a great impact for us on MoGo in > 19x19), the nakade (maybe there were other simultaneous published methods > for that), > and the RAVE part (Brugmann, Gelly, Silver - Aja said that RAVE was > invented by David and I have no idea on that, but I'm sure Sylvain > contributed a lot on this and Brugmann did something ), > the parallelization (Tristan Cazenave and others have published similar > ideas; the Bourki et al paper has shown clear limitations in terms of > scalability and counter-examples), the automatic building of > patterns by direct policy search (J.-B. Hoock's papers) which can be used > far from Go, > the simultaneous use of > patterns designed by supervised learning, patterns designed by policy > search, > rave values, expert knowledge with a dirty complicated formula :-) > > MoGo was also, I think, the first use of never-ending learning for > designing automatically an opening book by MCTS (this was > moderately good at first because we did not want to use expert knowledge at > all, whereas human expertise was really necessary > for guiding the search...), after months of work on a grid this provides > very good results for 9x9 Go. > > The parallelization is only efficient for moderate time settings - > otherwise we have the scalability plateau. For games with very expensive > transitions it might be different... > > Best regards, > Olivier > > _______________________________________________ > Computer-go mailing list > [email protected] > http://dvandva.org/cgi-bin/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
