David Silver wrote: >Yes, in our experiments they were just constant numbers M=N=100.
If M and N are the same, is there any reason to run M simulations and N simulations separately? What happens if you combine them and calculate V and g in the single loop? >Okay, let's continue the example above. Let's say that in position s, >using the current theta, moves a, b and c will be selected with >probability 0.5, 0.3 and 0.2 respectively. Let's say that move a was >actually selected. Now consider pattern 1, this is matched after a. >But the probability of matching was (0.5*1 +0.3*1 +0.2*0) = 0.8. So >psi_1(s,a)=1-0.8 = 0.2. For pattern 2, psi_2(s,a)=1- >(0.5*1+0.3*0+0.2*1)=0.3, etc.. So in this example the vector psi(s,a) >= [0.2,0.3,-0.3,-0.2]. > >In other words, psi tells us whether each pattern was actually matched >more or less than we could have expected. I understood what psi was. I am not sure how it works, but anyway I can see your algorithm now. Thanks. -- Yamato _______________________________________________ computer-go mailing list [email protected] http://www.computer-go.org/mailman/listinfo/computer-go/
