chinakook opened a new issue #17907: Depthwise in windows is 10 times slower than linux on gpu URL: https://github.com/apache/incubator-mxnet/issues/17907 ## Description Depthwise in windows is 10x slower than linux on gpu ### Error Message Windows: SyncCopyCPU2GPU 🔍 | 12.673 ms | 12.673 ms | 0.325 ms | 39 -- | -- | -- | -- | -- SyncCopyGPU2CPU 🔍 | 10.121 ms | 10.121 ms | 0.087 ms | 117 DeleteVariable 🔍 | 1.603 ms | 1.603 ms | 0.002 ms | 1014 BatchNorm 🔍 | 90.036 ms | 90.036 ms | 0.066 ms | 1365 Convolution 🔍 | 3,134.914 ms | 3,134.914 ms | 1.710 ms | 1833 Activation 🔍 | 75.352 ms | 75.352 ms | 0.055 ms | 1365 transpose 🔍 | 30.556 ms | 30.556 ms | 0.060 ms | 507 Flatten 🔍 | 7.007 ms | 7.007 ms | 0.015 ms | 468 slice_axis 🔍 | 19.195 ms | 19.195 ms | 0.062 ms | 312 _mul_scalar 🔍 | 23.824 ms | 23.824 ms | 0.056 ms | 429 zeros_like 🔍 | 3.767 ms | 3.767 ms | 0.048 ms | 78 where 🔍 | 3.565 ms | 3.565 ms | 0.046 ms | 78 slice_like 🔍 | 14.537 ms | 14.537 ms | 0.062 ms | 234 Reshape 🔍 | 12.183 ms | 12.183 ms | 0.015 ms | 819 SliceChannel 🔍 | 5.935 ms | 5.935 ms | 0.076 ms | 78 _plus_scalar 🔍 | 11.447 ms | 11.447 ms | 0.059 ms | 195 exp 🔍 | 4.079 ms | 4.079 ms | 0.052 ms | 78 Concat 🔍 | 17.887 ms | 17.887 ms | 0.066 ms | 273 broadcast_mul 🔍 | 8.639 ms | 8.639 ms | 0.055 ms | 156 _div_scalar 🔍 | 3.986 ms | 3.986 ms | 0.051 ms | 78 _contrib_box_nms 🔍 | 48.896 ms | 48.896 ms | 1.254 ms | 39 softmax 🔍 | 2.686 ms | 2.686 ms | 0.069 ms | 39 _greater_scalar 🔍 | 2.339 ms | 2.339 ms | 0.060 ms | 39 ones_like 🔍 | 1.908 ms | 1.908 ms | 0.049 ms | 39 broadcast_add 🔍 | 4.345 ms | 4.345 ms | 0.056 ms | 78 elemwise_add 🔍 | 3.568 ms | 3.568 ms | 0.046 ms | 78 broadcast_sub 🔍 | 5.719 ms | 5.719 ms | 0.147 ms | 39 broadcast_div 🔍 | 2.637 ms | 2.637 ms | 0.068 ms | 39 elemwise_sub 🔍 | 3.369 ms | 3.369 ms | 0.043 ms | 78 Totals | 3,566.773 ms | 3,566.773 ms | 0.357 ms | 9984 Linux: SyncCopyGPU2CPU 🔍 | 4.928 ms | 4.928 ms | 0.042 ms | 117 -- | -- | -- | -- | -- SyncCopyCPU2GPU 🔍 | 11.269 ms | 11.269 ms | 0.289 ms | 39 Activation 🔍 | 24.682 ms | 24.682 ms | 0.022 ms | 1131 Convolution 🔍 | 198.481 ms | 198.481 ms | 0.108 ms | 1833 BatchNorm 🔍 | 40.352 ms | 40.352 ms | 0.030 ms | 1365 _FusedOp 🔍 | 1,188.433 ms | 1,188.433 ms | 1.524 ms | 780 transpose 🔍 | 11.064 ms | 11.064 ms | 0.022 ms | 507 Flatten 🔍 | 2.973 ms | 2.973 ms | 0.006 ms | 468 softmax 🔍 | 1.125 ms | 1.125 ms | 0.029 ms | 39 Concat 🔍 | 9.672 ms | 9.672 ms | 0.035 ms | 273 where 🔍 | 1.262 ms | 1.262 ms | 0.016 ms | 78 slice_axis 🔍 | 5.564 ms | 5.564 ms | 0.020 ms | 273 zeros_like 🔍 | 0.599 ms | 0.599 ms | 0.015 ms | 39 DeleteVariable 🔍 | 3.213 ms | 3.213 ms | 0.003 ms | 1053 Reshape 🔍 | 0.605 ms | 0.605 ms | 0.005 ms | 117 broadcast_mul 🔍 | 3.145 ms | 3.145 ms | 0.020 ms | 156 broadcast_add 🔍 | 1.642 ms | 1.642 ms | 0.021 ms | 78 broadcast_sub 🔍 | 3.018 ms | 3.018 ms | 0.077 ms | 39 broadcast_div 🔍 | 1.526 ms | 1.526 ms | 0.039 ms | 39 SliceChannel 🔍 | 2.744 ms | 2.744 ms | 0.035 ms | 78 _contrib_box_nms 🔍 | 32.184 ms | 32.184 ms | 0.825 ms | 39 _greater_scalar 🔍 | 0.840 ms | 0.840 ms | 0.022 ms | 39 Totals | 1,549.320 ms | 1,549.320 ms | 0.181 ms | 8580 ## To Reproduce mxnet 1.6.0 official predict a ssd_mobienet1.0_custom model with 300x300 on gpu
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
