Hello Heikki, Heikki Levanto: <[EMAIL PROTECTED]>: >On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote: >> Being a computer scientist but new to go, i can grasp some of the theory. >> The question I was trying to get across was: >> >> In a game of self play, if both parties are employing only monte carlo, >> surely its not a good conceptual representation of a human, and if the >> reinforcement learning is based on random simulations wouldnt it be very >> weak when playing a real human? > > >Here is another amateur answering. > >The way I understand it, modern Monte Carlo programs do not even try to >emulate a human player with a random player - obviously that would not work.
I believe CrazyStone's use of patterns does so and it seems successful. Hideki >What they do is that they build a quite traditional search tree starting from >the current position. They use a random playout as a crude way to evaluate a >position. Based on this evaluation, they decide which branch of the tree to >expand. > >This is the way I understand the random playouts: If, in a given position, >white is clearly ahead, he will win the game if both parts play perfect >moves. He is also likely to win if both parts play reasonably good moves >(say, like human amateurs), but there is a bit more of a chance that one >player hits upon a good combination which the other misses, so the result is >not quite as reliable. If the playouts are totally random, there is still a >better chance for white to win, because both parts make equally bad moves. >The results have much more variation, of course. So far it does not sound >like a very good proposal, but things change if you consider the facts that >we don't have perfecr oracles, and good humans are slow to play out a >position, and can not be integrated into a computer program. Whereas random >playouts can be done awfully fast, tens of thousands of playouts in a second. >Averaging the reuslts gives a fair indication of who is more likely to win >from that position, just what is needed to decide which part of the search >tree to expand. > >The 'random' playouts are not totally random, they include a minimum of >tactical rules (do not fill own eyes, do not pass as long as there are valid >moves). Even this little will produce a few blind spots, moves that the >playouts can not see, and systematically wrong results. Adding more >go-specific knowledge can make the results much better (more likely to be >right), but can also add some more blind spots. And it costs time, reducing >the number of playouts the program can make. > >Hope that explains something of the mystery > > >Regards > > Heikki -- [EMAIL PROTECTED] (Kato) _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
