Softmax activation looks pretty interesting! I guess in that case you'd need to 
change the meaning of the outputs to ( prob of single win, prob of single loss, 
prob of gammon win, prob of gammon loss, prob of bg win, prob of bg loss ); 
then they all have to sum to 1 but there's no restriction that one be larger or 
smaller than another. ie rather than having a "prob of any win" output.

On training with conditional probabilities: it actually makes no difference in 
the middle of the game - the "new" value you're training against is just the 
network estimate of the conditional probability again, so no division 
necessary. You have to be careful at the end of the game - ie do you train the 
conditional gammon win node if the player loses? I'm finding a fair bit of 
sensitivity to assumptions about this and I'm probably doing something wrong 
there. :)

Along these lines, even with the usual gammon win output: do you train this any 
longer in mid-game if the opponent has borne in a checker, or do you just stop 
training?


On Dec 11, 2011, at 7:56 AM, Øystein Schønning-Johansen wrote:

> Hi Mark!
> 
> How's your rally driving going. ;-)
> 
> On Sun, Dec 11, 2011 at 4:45 AM, Mark Higgins <[email protected]> wrote:
> I notice in gnubg and other neural networks the probability of gammon gets 
> its own output node, alongside the probability of (any kind of) win.
> 
> Doesn't this sometimes mean that the estimated probability of gammon could be 
> larger than the probability of win, since both sigmoid outputs run from 0 to 
> 1?
> 
> There is a sanity check function called after the neural net evaluation, that 
> check that gammons don't exceed wins and backgammon does not exceed gammons.
>  
> I'm playing around with making the gammon node represent the probability of a 
> gammon win conditioned on a win; then the unconditional probability of a 
> gammon win = prob of win * conditional prob of gammon win. In that setup, 
> both outputs are free to roam (0,1) without causing inconsistencies.
> 
> That's a possibility, but I go not believe it gains anything. (This is of 
> course just a guess, since I've not tried. And you are of course free to 
> try.) I guess you also need a similar scheme for backgammons?
>  
> Is there something I'm missing here about why this is suboptimal? Is there 
> some other way people tend to ensure that prob of gammon win <= prob of any 
> kind of win?
> 
> I guess you have to divide by the win prob in the training, which is still 
> just an estimate. Hmmm.. I'm still thinking, maybe it can gain something, 
> since they are kind of depending on each other.
> 
> However... what I would rather try is to have six outputs with a softmax 
> activation function. Several neural net experts recommends softmax in their 
> books and papers, and other parameter update rules (other than 
> backpropagation) has been developed based on softmax outputs.
> 
> -Øystein
> 

_______________________________________________
Bug-gnubg mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/bug-gnubg

Reply via email to