Hi,

I am working on a CNN for winrate and territory:

approach:
 - input 2 layers for b and w stones
- 1. output: 1 layer territory (0.0 for owned by white, 1.0 for owned by black (because I missed TANH in the first place I used SIGMOID))
 - 2. output: label for -60 to +60 territory leading by black
the loss of both outputs is trained

the idea is, that this way I do not have to put komi into input and make the winrate from the statistics of the trained label:

e.g. komi 6.5: I sum the probabilites from +7 to +60 and get something like a winrate

I trained with 800000 positions with territory information through 500 playouts from oakfoam, which I symmetrized by the 8 transformation leading to >6000000 positions. (It is expensive to produce the positions due to the playouts....)

The layers are the same as the large network from Christopher Clark <http://arxiv.org/find/cs/1/au:+Clark_C/0/1/0/all/0/1>, Amos Storkey <http://arxiv.org/find/cs/1/au:+Storkey_A/0/1/0/all/0/1> : http://arxiv.org/abs/1412.3409


I get reasonable territory predictions from this network (compared to 500 playouts of oakfoam), the winrates seems to be overestimated. But anyway, it looks as it is worth to do some more work on it.

The idea is, I can do the equivalent of lets say 1000 playouts with a call to the CNN for the cost of 2 playouts some time...


Now I try to do a soft turnover from conventional playouts to CNN predicted winrates within the framework of MC.

I do have some ideas, but I am not happy with them.

Maybe you have better ones :)


Thanks a lot

Detlef

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to