PistonY opened a new issue #13709: Why FP16 training speed is too slow on Tesla T4 in Gluon? URL: https://github.com/apache/incubator-mxnet/issues/13709 Hi, I tried to train with FP16 on Tesla T4, but it's speed is slower than GTX 1070 with FP32. Could you please give me some suggests to solve that? T4 is on Mxnet-cu100mkl and GTX1070 is on mxnet-cu90mkl Here are my script and logs: code: https://gist.github.com/PistonY/8dfcefdc46b747afd4d18b37f9a18665 logs: T4 log: ``` INFO:root:Iter 390. Loss: 2.14372, Train RMSE 0.23653.Time 00:05:47.lr 0.019948717948717953 INFO:root:Test Loss: 1.935017, Test acc 0.327200. INFO:root:Iter 780. Loss: 1.89404, Train RMSE 0.22111.Time 00:05:52.lr 0.03994871794871795 INFO:root:Test Loss: 1.460350, Test acc 0.473100. INFO:root:Iter 1170. Loss: 1.72982, Train RMSE 0.20837.Time 00:05:49.lr 0.05994871794871795 INFO:root:Test Loss: 1.288763, Test acc 0.559500. INFO:root:Iter 1560. Loss: 1.57620, Train RMSE 0.19388.Time 00:05:48.lr 0.07994871794871795 INFO:root:Test Loss: 1.856537, Test acc 0.530100. ``` GTX 1070 log: ``` INFO:root:Epoch 0, Iter 390. Loss: 2.12699, Train RMSE 0.23722.Time 00:03:00.lr 0.019948717948717953 INFO:root:Test Loss: 1.746372, Test acc 0.361800. ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
