Hello Heikki,

Heikki Levanto: <[EMAIL PROTECTED]>:
>On Sat, Nov 15, 2008 at 11:38:34PM +0100, [EMAIL PROTECTED] wrote:
>> Being a computer scientist but new to go, i can grasp some of the theory.
>> The question I was trying to get across was:
>> 
>> In a game of self play, if both parties are employing only monte carlo,
>> surely its not a good conceptual representation of a human, and if the
>> reinforcement learning is based on random simulations wouldnt it be very
>> weak when playing a real human?
>
>
>Here is another amateur answering.
>
>The way I understand it, modern Monte Carlo programs do not even try to
>emulate a human player with a random player - obviously that would not work.

I believe CrazyStone's use of patterns does so and it seems 
successful.

Hideki

>What they do is that they build a quite traditional search tree starting from
>the current position. They use a random playout as a crude way to evaluate a
>position. Based on this evaluation, they decide which branch of the tree to
>expand.
>
>This is the way I understand the random playouts: If, in a given position,
>white is clearly ahead, he will win the game if both parts play perfect
>moves. He is also likely to win if both parts play reasonably good moves
>(say, like human amateurs), but there is a bit more of a chance that one
>player hits upon a good combination which the other misses, so the result is
>not quite as reliable. If the playouts are totally random, there is still a
>better chance for white to win, because both parts make equally bad moves.
>The results have much more variation, of course. So far it does not sound
>like a very good proposal, but things change if you consider the facts that
>we don't have perfecr oracles, and good humans are slow to play out a
>position, and can not be integrated into a computer program. Whereas random
>playouts can be done awfully fast, tens of thousands of playouts in a second.
>Averaging the reuslts gives a fair indication of who is more likely to win
>from that position, just what is needed to decide which part of the search
>tree to expand.
>
>The 'random' playouts are not totally random, they include a minimum of
>tactical rules (do not fill own eyes, do not pass as long as there are valid
>moves). Even this little will produce a few blind spots, moves that the
>playouts can not see, and systematically wrong results. Adding more
>go-specific knowledge can make the results much better (more likely to be
>right), but can also add some more blind spots. And it costs time, reducing
>the number of playouts the program can make.
>
>Hope that explains something of the mystery
>
>
>Regards
>
>   Heikki
--
[EMAIL PROTECTED] (Kato)
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to