I mostly skimmed it, but here's what I got from it: In a simulation,
pick moves based off the leaf node's RAVE values, but discount moves
whose follow-up moves have already been taken.
The tiling is simply a tracking of how effect a move is when combined
with a specific follow-up move. Near the start of a simulation, this
would match RAVE values. Deep in a simulation, it's highly situational
and based on which follow-up moves remain open.
I hope that helps!
Sent from my iPhone
On May 19, 2010, at 7:50 PM, Darren Cook <[email protected]> wrote:
My problem is that I can't find many papers about learning of MC
playout
policies, in particular patterns.
A just published paper about learning MC policies:
http://hal.inria.fr/inria-00456422/fr/
It works quite well for Havannah (not tested on hex I think).
I struggled with this paper ("Multiple Overlapping Tiles for
Contextual
Monte Carlo Tree Search"), as it wasn't clear to me what a "tile" was.
Specifically I couldn't work out if they were 2d patterns of
black/white/empty, or are they are a sequence of moves (e.g. joseki,
forcing moves, endgame sente/gote sequences, etc. in go)? Or perhaps
something else altogether?
While I wear the dunce's cap and stand in the corner, is some kind
soul
able to explain the idea in go terms?
Thanks,
Darren
--
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/ (Shodan Go Bet - who will win?)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go