Hi,
I bought GTX 1080.
One prediction time is as follows.
F32 F128 F256 MNIST
GTX 1080 0.48ms 1.45ms 2.38ms 17sec, CUDA 8.0, cuDNN v5.0, Core i7 980X
3.3GHz 6core
GTX 1080 0.87ms 1.79ms 2.65ms 19sec, CUDA 8.0, cuDNN v5.1, Core i7 980X
3.3GHz 6core
GTX 980 0.60ms 1.51ms 2.80ms 24sec, CUDA 7.5, cuDNN v5.0, Xeon W3680
3.3GHz 6core
GTX 980 0.98ms 3.06ms 5.37ms 24sec, CUDA 7.0, cuDNN v3.0, Xeon W3680
3.3GHz 6core
GRID K520 ------ 5.0 ms 14.7 ms 44sec, CUDA 7.0, cuDNN v3.0, Amazon EC2
g2.8xlarge, Xeon E5-2650 v2 2.60GHz
Tesla K80 1.66ms 3.68ms 7.25ms 33sec, CUDA 7.5, cuDNN v5.1, Amazon EC2
p2.xlarge, Xeon E5-2686 v4 2.30GHz
cuDNN v5.0 is much faster than cuDNN v3.0 on GTX 980, and cuDNN v5.1
is a bit slower than cuDNN v5.0 on GTX 1080.
But Hirabot author reported cuDNN v5.1 is faster.
cuDNN v5.0 2.43ms
cuDNN v5.1 1.87ms
Windows 7, GTX 1060, i7-3770k(oc:4.2GHz), CUDA 8.0
3x3-128filter 23 layers + 3x3-1filter 1 layer, (short cut)
I also tried Amazon EC2 p2.xlarge, but it was not fast.
F256 means Filters is 256. Channels is 49, for Policy network.
256 5x5 x1, 256 3x3 x10, 1 3x3 x1
F128 changes Filters to 128.
F32 means Filters is 32, Channels is 50, for Value network.
32 5x5 x1, 32 3x3 x7, 1 1x1 x1, fully connect 256, fully connect tanh 1
Board size is 19x19, mini-batch is 1.
MNIST is learning time. This is included in Caffe's examples.
All results use Caffe on ubuntu 14.04.
Regards,
Hiroshi Yamashita
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go