On 15/12/2016 12:35, Hiroshi Yamashita wrote: > F32 F128 F256 MNIST > GTX 1080 0.48ms 1.45ms 2.38ms 17sec, CUDA 8.0, cuDNN v5.0, Core i7 > 980X 3.3GHz 6core > GTX 1080 0.87ms 1.79ms 2.65ms 19sec, CUDA 8.0, cuDNN v5.1, Core i7 > 980X 3.3GHz 6core > GTX 980 0.60ms 1.51ms 2.80ms 24sec, CUDA 7.5, cuDNN v5.0, Xeon > W3680 3.3GHz 6core
The speedup from the GTX980 -> GTX1080 is very bad, isn't it? The card has almost 100% more theoretical FLOPS, and much of the increase is due to clock-speed and more shaders (and less so due to uarch changes) so the extra FLOPS should be observable at least for the big networks. I think you are entirely limited by setup/CPU/driver/API overhead. The Hirabot network is bigger, the author has a smaller GPU, and his CPU is faster. This would reduce the relative CPU overhead. I suspect these overheads are also very large in cuDNN with a mini-batch size of 1. My OpenCL code does not use the Winograd optimization and is generic, identical code for AMD and NVIDIA cards, yet the performance is very similar to cuDNN v5.1. This seems to indicate GPU processing is not the actual bottleneck. -- GCP _______________________________________________ Computer-go mailing list [email protected] http://computer-go.org/mailman/listinfo/computer-go
