On 15/12/2016 12:35, Hiroshi Yamashita wrote:
>           F32    F128    F256    MNIST
> GTX 1080  0.48ms  1.45ms  2.38ms  17sec,  CUDA 8.0, cuDNN v5.0, Core i7
> 980X 3.3GHz 6core
> GTX 1080  0.87ms  1.79ms  2.65ms  19sec,  CUDA 8.0, cuDNN v5.1, Core i7
> 980X 3.3GHz 6core
> GTX 980   0.60ms  1.51ms  2.80ms  24sec,  CUDA 7.5, cuDNN v5.0, Xeon
> W3680   3.3GHz 6core

The speedup from the GTX980 -> GTX1080 is very bad, isn't it? The card
has almost 100% more theoretical FLOPS, and much of the increase is due
to clock-speed and more shaders (and less so due to uarch changes) so
the extra FLOPS should be observable at least for the big networks.

I think you are entirely limited by setup/CPU/driver/API overhead. The
Hirabot network is bigger, the author has a smaller GPU, and his CPU is
faster. This would reduce the relative CPU overhead.

I suspect these overheads are also very large in cuDNN with a mini-batch
size of 1. My OpenCL code does not use the Winograd optimization and is
generic, identical code for AMD and NVIDIA cards, yet the performance is
very similar to cuDNN v5.1. This seems to indicate GPU processing is not
the actual bottleneck.

-- 
GCP
_______________________________________________
Computer-go mailing list
[email protected]
http://computer-go.org/mailman/listinfo/computer-go

Reply via email to