Hi David,

Thanks for sharing your experiments. It is very interesting.

I tried chain pooling too, and it was too slow. It made the network about twice 
slower in tensorflow (using tf.unsorted_segment_sum or max). I'd rather have 
twice more layers.

I never tried dilated convolutions. That sounds interesting.

The value network of AQ has an interesting architecture. It does not go 
directly from 19x19 to scalar, but works like image-recognition networks, with 
2x2 pooling until it reaches 1x1. I have not tried it yet, but that feels like 
a good idea.

Rémi

----- Mail original -----
De: "David Wu" <lightvec...@gmail.com>
À: computer-go@computer-go.org
Envoyé: Mercredi 28 Février 2018 20:04:11
Objet: Re: [Computer-go] Crazy Stone is back




It's not even just liberties and semeai, it's also eyes. Consider for example a 
large dragon that has miai for 2 eyes in distant locations, and the opponent 
then takes one of them - you'd like the policy net to now suggest the other 
eye-making move far away. And you'd also like the value net to distinguish the 
three situations where the whole group has 2 eyes even when they are distant 
versus the ones where it doesn't. 


I've been doing experiments with somewhat smaller neural nets (roughly 4-7 
residual blocks = 8-14 layers), without sticking to an idealized "zero" 
approach. I've only experimented with policy nets so far, but presumably much 
of this should also transfer to a value net's understanding too. 



1. One thing I tried was chain pooling, which was neat, but ultimately didn't 
seem promising: 

https://github.com/lightvector/GoNN#chain-pooling 
It solves all of these problems when the strings are solidly connected. It 
helps also when the strings are long but not quite solidly connected too, the 
information still propagates faster than without it. But of course, if there 
are lots of little strings forming a group, diagonal connections, bamboo 
joints, etc, then of course it won't help. And also chain pooling is 
computationally costly, at least in Tensorflow, and it might have negative 
effects on the rest of the neural net that I don't understand. 



2. A new thing I've been trying recently that actually does seem moderately 
promising is dilated convolutions, although I'm still early in testing. They 
also help increase the speed of information propagation, and don't require 
solidly connected strings, and also are reasonably cheap. 



In particular: my residual blocks have 192 channels, so I tried taking several 
of the later residual blocks in the neural net and making 64 of the channels of 
the first convolution in each block use dilated convolutions (leaving 128 
channels of regular convolutions), with dilation factors of 2 or 3. 
Intuitively, the idea is that earlier blocks could learn to compute 2x2 or 3x3 
connectivity patterns, and then the dilated convolutions in later residual 
blocks will be able to use that to propagate information several spaces at a 
time across connected groups or dragons. 


So far, indications are that this works. W hen I looked at it in various board 
positions, it helped in a variety of capturing race and 
large-dragon-two-eye-miai situations, correctly suggesting moves that the net 
without dilated convolutions would fail to find due to the move being too far 
away. Also d ilated convolutions seem pretty cheap - it only slightly increases 
the computational cost of the net. 


So far, I've found that it doesn't significantly improve the overall loss 
function, presumably because now there are 128 channels instead of 192 channels 
of ordinary convolutions, so in return for being better at long-distance 
interactions, the neural net has gotten worse at some local tactics. But it 
also hasn't gotten worse the way it would if I simply dropped the number of 
channels from 192 to 128 without adding any new channels, so the dilated 
convolutions are being "used" for real work. 

I'd be curious to hear if anyone else has tried dilated convolutions and what 
results they got. If there's anything at all to do other than just add more 
layers, I think they're the most promising thing I know of. 




On Wed, Feb 28, 2018 at 12:34 PM, Rémi Coulom < remi.cou...@free.fr > wrote: 


192 and 256 are the numbers of channels. They are fully connected, so the 
number of 3x3 filters is 192^2, and 256^2. 

Having liberty counts and string size as input helps, but it solves only a 
small part of the problem. You can't read a semeai from just the liberty-count 
information. 

I tried to be clever and find ways to propagate information along strings in 
the network. But all the techniques I tried make the network much slower. 
Adding more layers is simple and works. 

Rémi 

----- Mail original ----- 
De: "Darren Cook" < dar...@dcook.org > 
À: computer-go@computer-go.org 
Envoyé: Mercredi 28 Février 2018 16:43:10 
Objet: Re: [Computer-go] Crazy Stone is back 



> Weights_31_3200 is 20 layers of 192, 3200 board evaluations per move 
> (no random playout). But it still has difficulties with very long 
> strings. My next network will be 40 layers of 256, like Master. 

"long strings" here means solidly connected stones? 

The 192 vs. 256 is the number of 3x3 convolution filters? 

Has anyone been doing experiments with, say, 5x5 filters (and fewer 
layers), and/or putting more raw information in (e.g. liberty counts - 
which makes the long string problem go away, if I've understood 
correctly what that is)? 

Darren 
_______________________________________________ 
Computer-go mailing list 
Computer-go@computer-go.org 
http://computer-go.org/mailman/listinfo/computer-go 
_______________________________________________ 
Computer-go mailing list 
Computer-go@computer-go.org 
http://computer-go.org/mailman/listinfo/computer-go 

_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go
_______________________________________________
Computer-go mailing list
Computer-go@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to