Hi Yamato,
Thanks for the detailed explanation.
M, N and alpha are constant numbers, right? What did you set them to?
You're welcome!
Yes, in our experiments they were just constant numbers M=N=100.
The feature vector is the set of patterns you use, with value 1 if a
pattern is matched and 0 otherwise. The simulation policy selects
actions in proportion to the exponentiated, weighted sum of all
matching patterns. For example let's say move a matches patterns 1
and
2, move b matches patterns 1 and 3, and move c matches patterns 2 and
4. Then move a would be selected with probability e^(theta1 +
theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 +
theta4)). The theta values are the weights on the patterns which we
would like to learn. They are the log of the Elo ratings in Remi
Coulom's approach.
OK, I guess it is the formula 5 in the paper.
Yes, exactly.
The only tricky part is computing the vector psi(s,a). Each component
of psi(s,a) corresponds to a particular pattern, and is the
difference
between the observed feature (i.e. whether the pattern actually
occurred after move a in position s) and the expected feature (the
average value of the pattern, weighted by the probability of
selecting
each action).
I still don't understand this. Is it the formula 6?
Could you please give me an example like the above?
Yes that's right, this is equation 6.
Okay, let's continue the example above. Let's say that in position s,
using the current theta, moves a, b and c will be selected with
probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was
actually selected. Now consider pattern 1, this is matched after a.
But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So
psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1-
(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)
= [0.2,0.3,-0.3,-0.2].
In other words, psi tells us whether each pattern was actually matched
more or less than we could have expected.
Hope this helps.
-Dave
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/