Hi Yamato,

Thanks for the detailed explanation. M, N and alpha are constant numbers, right? What did you set them to?

You're welcome! Yes, in our experiments they were just constant numbers M=N=100.

The feature vector is the set of patterns you use, with value 1 if a pattern is matched and 0 otherwise. The simulation policy selects actions in proportion to the exponentiated, weighted sum of allmatching patterns. For example let's say move a matches patterns 1and2, move b matches patterns 1 and 3, and move c matches patterns 2 and 4. Then move a would be selected with probability e^(theta1 + theta2) / (e^(theta1 + theta2) + e^(theta1 + theta3) + e^(theta2 + theta4)). The theta values are the weights on the patterns which we would like to learn. They are the log of the Elo ratings in Remi Coulom's approach.OK, I guess it is the formula 5 in the paper.

Yes, exactly.

The only tricky part is computing the vector psi(s,a). Each componentof psi(s,a) corresponds to a particular pattern, and is thedifferencebetween the observed feature (i.e. whether the pattern actually occurred after move a in position s) and the expected feature (theaverage value of the pattern, weighted by the probability ofselectingeach action).I still don't understand this. Is it the formula 6? Could you please give me an example like the above?

Yes that's right, this is equation 6.

`Okay, let's continue the example above. Let's say that in position s,`

`using the current theta, moves a, b and c will be selected with`

`probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was`

`actually selected. Now consider pattern 1, this is matched after a.`

`But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So`

`psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1-`

`(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)`

`= [0.2,0.3,-0.3,-0.2].`

`In other words, psi tells us whether each pattern was actually matched`

`more or less than we could have expected.`

Hope this helps. -Dave

_______________________________________________ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/