I mostly skimmed it, but here's what I got from it: In a simulation, pick moves based off the leaf node's RAVE values, but discount moves whose follow-up moves have already been taken.

The tiling is simply a tracking of how effect a move is when combined with a specific follow-up move. Near the start of a simulation, this would match RAVE values. Deep in a simulation, it's highly situational and based on which follow-up moves remain open.

I hope that helps!

Sent from my iPhone

On May 19, 2010, at 7:50 PM, Darren Cook <[email protected]> wrote:

My problem is that I can't find many papers about learning of MC playout
policies, in particular patterns.

A just published paper about learning MC policies:
http://hal.inria.fr/inria-00456422/fr/
It works quite well for Havannah (not tested on hex I think).

I struggled with this paper ("Multiple Overlapping Tiles for Contextual
Monte Carlo Tree Search"), as it wasn't clear to me what a "tile" was.
Specifically I couldn't work out if they were 2d patterns of
black/white/empty, or are they are a sequence of moves (e.g. joseki,
forcing moves, endgame sente/gote sequences, etc. in go)? Or perhaps
something else altogether?

While I wear the dunce's cap and stand in the corner, is some kind soul
able to explain the idea in go terms?

Thanks,
Darren


--
Darren Cook, Software Researcher/Developer

http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go

Reply via email to