djhbrown .: 
>thank you for sharing the paper.
>"the Maximum Frequency method is based on the
>maximization of the difference between the expected reward of
>the optimal move and that of others"
>intuitively it feels that biasing random search towards the optimal route
>would yield reduced failure rates, yet it does seem to depend on knowing
>what the optimal route is beforehand.

UCT is never a random search but deterministic.

Maxmizing KL-divergence just speed-up the convergence of the interative 


>if i knew the optimal route to get from A to B, i wouldn't bother doing a
>random search, but just follow it.
>"This property [“bias in suboptimal moves”] means that the impact of
>missing the optimal move is much greater for one player than it is for the
>i find this conclusion puzzling because Go is a zero-sum game, so what is
>good for one side is equally bad for the other, not variably so.  I have
>not checked the statistical inference calculations to see whether there is
>an error in them.
>---- inline file

>Computer-go mailing list


Hideki Kato <>
Computer-go mailing list

Reply via email to