David Silver wrote:
>Yes, in our experiments they were just constant numbers M=N=100.

If M and N are the same, is there any reason to run M simulations and
N simulations separately?  What happens if you combine them and calculate
V and g in the single loop?

>Okay, let's continue the example above. Let's say that in position s,  
>using the current theta, moves a, b and c will be selected with  
>probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was  
>actually selected. Now consider pattern 1, this is matched after a.  
>But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So  
>psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- 
>(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a)  
>= [0.2,0.3,-0.3,-0.2].
>
>In other words, psi tells us whether each pattern was actually matched  
>more or less than we could have expected.

I understood what psi was. I am not sure how it works, but anyway I can
see your algorithm now. Thanks.

--
Yamato
_______________________________________________
computer-go mailing list
[email protected]
http://www.computer-go.org/mailman/listinfo/computer-go/

Reply via email to