chinakook opened a new issue #17907: Depthwise in windows is 10 times slower 
than linux on gpu
URL: https://github.com/apache/incubator-mxnet/issues/17907
 
 
   ## Description
   Depthwise in windows is 10x slower than linux on gpu
   
   ### Error Message
   Windows:
   
   
   SyncCopyCPU2GPU 🔍 | 12.673 ms | 12.673 ms | 0.325 ms | 39
   -- | -- | -- | -- | --
   SyncCopyGPU2CPU 🔍 | 10.121 ms | 10.121 ms | 0.087 ms | 117
   DeleteVariable 🔍 | 1.603 ms | 1.603 ms | 0.002 ms | 1014
   BatchNorm 🔍 | 90.036 ms | 90.036 ms | 0.066 ms | 1365
   Convolution 🔍 | 3,134.914 ms | 3,134.914 ms | 1.710 ms | 1833
   Activation 🔍 | 75.352 ms | 75.352 ms | 0.055 ms | 1365
   transpose 🔍 | 30.556 ms | 30.556 ms | 0.060 ms | 507
   Flatten 🔍 | 7.007 ms | 7.007 ms | 0.015 ms | 468
   slice_axis 🔍 | 19.195 ms | 19.195 ms | 0.062 ms | 312
   _mul_scalar 🔍 | 23.824 ms | 23.824 ms | 0.056 ms | 429
   zeros_like 🔍 | 3.767 ms | 3.767 ms | 0.048 ms | 78
   where 🔍 | 3.565 ms | 3.565 ms | 0.046 ms | 78
   slice_like 🔍 | 14.537 ms | 14.537 ms | 0.062 ms | 234
   Reshape 🔍 | 12.183 ms | 12.183 ms | 0.015 ms | 819
   SliceChannel 🔍 | 5.935 ms | 5.935 ms | 0.076 ms | 78
   _plus_scalar 🔍 | 11.447 ms | 11.447 ms | 0.059 ms | 195
   exp 🔍 | 4.079 ms | 4.079 ms | 0.052 ms | 78
   Concat 🔍 | 17.887 ms | 17.887 ms | 0.066 ms | 273
   broadcast_mul 🔍 | 8.639 ms | 8.639 ms | 0.055 ms | 156
   _div_scalar 🔍 | 3.986 ms | 3.986 ms | 0.051 ms | 78
   _contrib_box_nms 🔍 | 48.896 ms | 48.896 ms | 1.254 ms | 39
   softmax 🔍 | 2.686 ms | 2.686 ms | 0.069 ms | 39
   _greater_scalar 🔍 | 2.339 ms | 2.339 ms | 0.060 ms | 39
   ones_like 🔍 | 1.908 ms | 1.908 ms | 0.049 ms | 39
   broadcast_add 🔍 | 4.345 ms | 4.345 ms | 0.056 ms | 78
   elemwise_add 🔍 | 3.568 ms | 3.568 ms | 0.046 ms | 78
   broadcast_sub 🔍 | 5.719 ms | 5.719 ms | 0.147 ms | 39
   broadcast_div 🔍 | 2.637 ms | 2.637 ms | 0.068 ms | 39
   elemwise_sub 🔍 | 3.369 ms | 3.369 ms | 0.043 ms | 78
   Totals | 3,566.773 ms | 3,566.773 ms | 0.357 ms | 9984
   
   
   
   Linux:
   
   SyncCopyGPU2CPU 🔍 | 4.928 ms | 4.928 ms | 0.042 ms | 117
   -- | -- | -- | -- | --
   SyncCopyCPU2GPU 🔍 | 11.269 ms | 11.269 ms | 0.289 ms | 39
   Activation 🔍 | 24.682 ms | 24.682 ms | 0.022 ms | 1131
   Convolution 🔍 | 198.481 ms | 198.481 ms | 0.108 ms | 1833
   BatchNorm 🔍 | 40.352 ms | 40.352 ms | 0.030 ms | 1365
   _FusedOp 🔍 | 1,188.433 ms | 1,188.433 ms | 1.524 ms | 780
   transpose 🔍 | 11.064 ms | 11.064 ms | 0.022 ms | 507
   Flatten 🔍 | 2.973 ms | 2.973 ms | 0.006 ms | 468
   softmax 🔍 | 1.125 ms | 1.125 ms | 0.029 ms | 39
   Concat 🔍 | 9.672 ms | 9.672 ms | 0.035 ms | 273
   where 🔍 | 1.262 ms | 1.262 ms | 0.016 ms | 78
   slice_axis 🔍 | 5.564 ms | 5.564 ms | 0.020 ms | 273
   zeros_like 🔍 | 0.599 ms | 0.599 ms | 0.015 ms | 39
   DeleteVariable 🔍 | 3.213 ms | 3.213 ms | 0.003 ms | 1053
   Reshape 🔍 | 0.605 ms | 0.605 ms | 0.005 ms | 117
   broadcast_mul 🔍 | 3.145 ms | 3.145 ms | 0.020 ms | 156
   broadcast_add 🔍 | 1.642 ms | 1.642 ms | 0.021 ms | 78
   broadcast_sub 🔍 | 3.018 ms | 3.018 ms | 0.077 ms | 39
   broadcast_div 🔍 | 1.526 ms | 1.526 ms | 0.039 ms | 39
   SliceChannel 🔍 | 2.744 ms | 2.744 ms | 0.035 ms | 78
   _contrib_box_nms 🔍 | 32.184 ms | 32.184 ms | 0.825 ms | 39
   _greater_scalar 🔍 | 0.840 ms | 0.840 ms | 0.022 ms | 39
   Totals | 1,549.320 ms | 1,549.320 ms | 0.181 ms | 8580
   
   
   
   ## To Reproduce
   mxnet 1.6.0 official
   predict a ssd_mobienet1.0_custom model with 300x300 on gpu
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to