I wish I was smart :(

David Silver wrote:

Hi Remi,

`I understood this. What I find strange is that using -1/1 should be
``equivalent to using 0/1, but your algorithm behaves differently: it
``ignores lost games with 0/1, and uses them with -1/1.
`

`Imagine you add a big constant to z. One million, say. This does not
``change the problem. You get either 1000000 or 1000001 as outcome of a
``playout. But then, your estimate of the gradient becomes complete noise.
`

`So maybe using -1/1 is better than 0/1 ? Since your algorithm depends
``so much on the definition of the reward, there must be an optimal way
``to set the reward. Or there must a better way to define an algorithm
``that would not depend on an offset in the reward.
`

`There is still something wrong that I don't understand. There may be a
``way to quantify the amount of noise in the unbiased gradient estimate,
``and it would depend on the average reward. Probably setting the
``average reward to zero is what would minimize noise in the gradient
``estimate. This is just an intuitive guess.
`

`Okay, now I understand your point :-) It's a good question - and I think
``you're right. In REINFORCE any baseline can be subtracted from the
``reward, without affecting the expected gradient, but possibly reducing
``its variance. The baseline leading to the best estimate is indeed the
``average reward. So it should be the case that {-1,+1} would estimate
``the gradient g more efficiently than {0,1}, assuming that we see similar
``numbers of black wins as white wins across the training set.
`

`So to answer your question, we can safely modify the algorithm to use
``(z-b) instead of z, where b is the average reward. This would then make
``the {0,1} and {-1,+1} cases equivalent (with appropriate scaling of
``step-size). I don't think this would have affected the results we
``presented (because all of the learning algorithms converged anyway, at
``least approximately, during training) but it could be an important
``modification for larger boards.
`
-Dave
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/