I have used TensorFlow to train a CNN that predicts the next move, with a similar architecture to what others have used (1 layers of 5x5 convolutions followed by 10 more layers of 3x3 convolutions, with 192 hidden units per layer and ReLU activation functions) but with much simpler inputs. I found the Python parts worked quite well, and that's how I did the training, using a GTX 980 GPU. But getting the trained network to work from C++ was a pain, and I ended up rolling out my own code to evaluate it using the CPU(s).
I also tried to train the network to predict the final ownership for each point on the board (in the hopes that it would learn about life and death), but this didn't work too well, presumably because I didn't have good enough data to train it with. Álvaro. On Thu, Mar 24, 2016 at 2:42 PM, Darren Cook <dar...@dcook.org> wrote: > Thanks for the very interesting replies, David, and Remi. > > No-one is using TensorFlow, then? Any reason not to? (I'm just curious > because there looks to be a good Udacity DNN course > (https://www.udacity.com/course/deep-learning--ud730), which I was > considering, but it is using TensorFlow.) > > > Remi wrote: > > programming back-propagation efficiently on the GPU. We did get a GPU > > version working, but it took a lot of time to program it, and was not > > so efficient. So the current DCNN of Crazy Stone is 100% trained on > > the CPU, and 100% running on the CPU. My CPU code is efficient, > > though. It is considerably faster than Caffe. My impression is that > > Caffe is inefficient because it uses the GEMM approach, which may be > > good for high-resolution pictures, but is not for small 19x19 > > boards. > > I did a bit of study on what GEMM is, and found this article and the 2nd > comment on it quite interesting: > > > http://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ > > The comment, by Scott Gray, mentioned: > > So instead of thinking of convolution as a problem of one large gemm > operation, it’s actually much more efficient as many small gemms. To > compute a large gemm on a GPU you need to break it up into many small > tiles anyway. So rather than waste time duplicating your data into a > large matrix, you can just start doing small gemms right away directly > on the data. Let the L2 cache do the duplication for you. > > > He doesn't quantify large vs. small; though I doubt anyone is doing > image recognition on 19x19 pixel images :-) > > Darren > _______________________________________________ > Computer-go mailing list > Computer-go@computer-go.org > http://computer-go.org/mailman/listinfo/computer-go >
_______________________________________________ Computer-go mailing list Computer-go@computer-go.org http://computer-go.org/mailman/listinfo/computer-go