Dear all,
  the original MCTS paper is by Rémi Coulom (to the best of my knowledge at
least...). It's clear for us that we did not invent MCTS
and always referenced Remi's paper.

*1) For UCB-like formula:*
- On the theoretical side, the consistency proof of MCTS without the
UCT-like exploration also comes from mogo-people (Berthier et al); instead
of saying that mogo's contribution is the introduction of UCT (which is good
for other games but not for Go), I would
have said that mogo's contribution is the analysis of MCTS without UCT (even
if MCTS without UCT existed before mogo).
- I'd like to point out that UCB-formulas are, I think, good for games with
random part (e.g. with random transition, or with hidden information which
leads to randomized strategies). But not for Go :-)

*2) For works on the Monte-Carlo part:*
For MoGo's contributions on the Monte-Carlo part, in particular with Yizao's
patterns (with other people as well). There
was a significant difference with previous attempts of designing good
Monte-Carlo parts in the sense that a good Monte-Carlo part
is not a MC-part which plays well as a standalone player, but a MC-part
which plays well with a MCTS on top of it; this is unfortunately not very
convenient as a criterion for designing a MC part... I think the main idea
was the idea of balancing - the situation should not be better for one of
the two players after one move by each player. This was claimed already in
Sylvain's thesis and (I think) earlier than that.

*3) Other mogo's contributions (I might forget many things...)*
are around the fillboard option (which has a great impact for us on MoGo in
19x19), the nakade (maybe there were other simultaneous published methods
for that),
and the RAVE part (Brugmann, Gelly, Silver - Aja said that RAVE was invented
by David and I have no idea on that, but I'm sure Sylvain contributed a lot
on this and Brugmann did something ),
the parallelization (Tristan Cazenave and others have published similar
ideas; the Bourki et al paper has shown clear limitations in terms of
scalability and counter-examples), the automatic building of
patterns by direct policy search (J.-B. Hoock's papers) which can be used
far from Go,
 the simultaneous use of
patterns designed by supervised learning, patterns designed by policy
search,
rave values, expert knowledge with a dirty complicated formula :-)

MoGo was also, I think, the first use of never-ending learning for designing
automatically an opening book by MCTS (this was
moderately good at first because we did not want to use expert knowledge at
all, whereas human expertise was really necessary
for guiding the search...), after months of work on a grid this provides
very good results for 9x9 Go.

The parallelization is only efficient for moderate time settings - otherwise
we have the scalability plateau. For games with very expensive transitions
it might be different...

Best regards,
Olivier
_______________________________________________
Computer-go mailing list
Computer-go@dvandva.org
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to