On Tue, Apr 27, 2010 at 05:54:33PM +0200, Olivier Teytaud wrote: > > My problem is that I can't find many papers about learning of MC playout > > policies, in particular patterns. > > > > A just published paper about learning MC policies: > http://hal.inria.fr/inria-00456422/fr/ > It works quite well for Havannah (not tested on hex I think).
Very interesting! Why have you restricted the tiling to the actions being both performed by the same player? This seems to give "if I played X before, following up by Y will be good", but wouldn't be "if opponent played X before, replying by Y will be good" at least as useful? Have you considered also discouraging replies that give very bad results? One thing I have hit when trying to implement something like this is that minimax prunes a lot of interesting situations - if sequence A-B-C is good, minimax will quickly redirect to less good A-X-C even if in simulations, B is very likely to be played. > But in the case of Go, the Wang-policy is too strong for being improved like > that. Does this imply that you have tried to implement it but weren't successful, or is this just a feeling? > (fill board and nakade in http://hal.inria.fr/inria-00386477/) Thanks, and I have thought I know about all the recent computer-go papers... :-) -- Petr "Pasky" Baudis When I feel like exercising, I just lie down until the feeling goes away. -- xed_over _______________________________________________ Computer-go mailing list [email protected] http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
