Hi Aja,
I would be interested in your results. I think the LGRF policy is only a
small first step into the direction of more adaptive playouts (and
hopefully the overcoming of the horizon effect).
As for the Last-Bad-Reply idea, you can read about my experiences with
this and related policies in my Master's thesis, if you're interested.
It contains the idea that resulted in the "Power of Forgetting" paper as
well.
http://www.ke.tu-darmstadt.de/lehre/arbeiten/master/2010/Baier_Hendrik.pdf
regards,
Hendrik
I admit that it's difficult for me to include such deterministic default
policy. :-)
With softmax policy, using the information of "last-LOST-reply" is maybe a good
direction.
Aja
_______________________________________________
Computer-go mailing list
[email protected]
http://dvandva.org/cgi-bin/mailman/listinfo/computer-go