stu1130 edited a comment on issue #14725: Performance Regression on CUDA10 URL: https://github.com/apache/incubator-mxnet/issues/14725#issuecomment-486016229 Rerun the minimal reproducible script shown above. Set the num = 100000 and got the result along with nvprof -s ``` # CUDA 10.0 mxnet-cu100mkl GPU activities: 35.43% 7.72535s 99995 77.257us 71.167us 83.295us volta_sgemm_32x32_sliced1x4_nn 29.99% 6.53866s 99995 65.389us 62.623us 72.287us volta_sgemm_128x64_nt 16.11% 3.51241s 199990 17.562us 7.2000us 49.471us [CUDA memcpy DtoH] 13.52% 2.94757s 99995 29.477us 27.872us 34.559us volta_sgemm_64x32_sliced1x4_tn ... Average: 0.001091881209394027 Total: 109.18266153335571 ------------------------------------------------------------------------------ # CUDA 9.2 mxnet-cu92mkl GPU activities: 44.88% 7.94254s 99995 79.429us 75.583us 84.703us volta_sgemm_32x32_sliced1x4_nn 19.34% 3.42300s 199990 17.115us 7.2950us 58.656us [CUDA memcpy DtoH] 17.95% 3.17554s 99995 31.757us 29.952us 38.655us volta_sgemm_32x32_sliced1x4_tn 12.94% 2.28917s 99995 22.892us 20.927us 29.280us volta_sgemm_128x64_nt ... Average: 0.0009327297395715428 Total: 93.26831030845642 ``` We can find **volta_sgemm_128x64_nt** on CUDA 9.2 took almost 3 times than CUDA 10. The reason why the total time it takes is simliar is that volta_sgemm_32x32_sliced1x4_nn takes most of the excution time and it's not the case in LSTM. Provide the other data shape combination script result * Note that all of them run only 100 times (num = 100) ``` data shape ('640,10000', '640,650', '10000,650') # CUDA 10.0 mxnet-cu100mkl GPU activities: 54.06% 256.89ms 190 1.3521ms 133.44us 11.855ms [CUDA memcpy DtoH] 15.03% 71.430ms 95 751.90us 707.26us 837.63us volta_sgemm_128x64_nn 14.59% 69.329ms 95 729.77us 690.62us 809.12us volta_sgemm_128x64_nt 12.93% 61.456ms 95 646.91us 568.61us 730.72us volta_sgemm_128x64_tn 1.33% 6.3357ms 95 66.691us 65.983us 67.487us # CUDA 9.2 mxnet-cu92mkl GPU activities: 56.68% 295.08ms 190 1.5531ms 133.41us 13.890ms [CUDA memcpy DtoH] 14.10% 73.427ms 95 772.92us 705.98us 839.36us volta_sgemm_128x64_nn 13.10% 68.185ms 95 717.74us 655.04us 775.13us volta_sgemm_128x64_nt 13.00% 67.684ms 95 712.47us 640.54us 778.40us volta_sgemm_128x128_tn ------------------------------------------------------------------------------ data shape ('960,10000', '960,650', '10000,650') # CUDA 10.0 mxnet-cu100mkl GPU activities: 79.61% 1.38828s 190 7.3068ms 289.41us 18.374ms [CUDA memcpy DtoH] 6.66% 116.15ms 95 1.2226ms 1.2185ms 1.2268ms volta_sgemm_128x64_nn 6.42% 111.92ms 95 1.1781ms 1.1731ms 1.1876ms volta_sgemm_128x64_nt 5.98% 104.26ms 95 1.0975ms 1.0937ms 1.1239ms volta_sgemm_32x128_tn 0.54% 9.4320ms 95 99.284us 98.719us 99.839us # CUDA 9.2 mxnet-cu92mkl GPU activities: 80.65% 1.45573s 190 7.6617ms 270.69us 18.452ms [CUDA memcpy DtoH] 6.43% 116.12ms 95 1.2223ms 1.2162ms 1.2279ms volta_sgemm_128x64_nn 6.03% 108.86ms 95 1.1459ms 1.1444ms 1.1490ms volta_sgemm_128x64_nt 5.61% 101.24ms 95 1.0657ms 1.0618ms 1.0718ms volta_sgemm_128x128_tn ------------------------------------------------------------------------------ data shape ('1600,10000', '1600,650', '10000,650') # CUDA 10.0 mxnet-cu100mkl GPU activities: 81.71% 2.63850s 190 13.887ms 534.24us 32.062ms [CUDA memcpy DtoH] 6.48% 209.16ms 95 2.2017ms 2.1817ms 2.3835ms volta_sgemm_128x64_nn 5.67% 183.19ms 95 1.9283ms 1.9143ms 1.9402ms volta_sgemm_128x64_nt 5.01% 161.71ms 95 1.7023ms 1.6479ms 1.7094ms volta_sgemm_128x64_tn # CUDA 9.2 mxnet-cu92mkl GPU activities: 81.32% 2.57187s 190 13.536ms 505.56us 32.412ms [CUDA memcpy DtoH] 6.63% 209.82ms 95 2.2086ms 2.1808ms 2.3828ms volta_sgemm_128x64_nn 5.71% 180.48ms 95 1.8998ms 1.8987ms 1.9037ms volta_sgemm_128x64_nt 5.19% 164.03ms 95 1.7266ms 1.7225ms 1.7318ms volta_sgemm_128x128_tn ------------------------------------------------------------------------------ data shape ('1280,10000', '1280,650', '10000,650') # CUDA 10.0 mxnet-cu100mkl GPU activities: 82.14% 2.12338s 190 11.176ms 451.01us 26.345ms [CUDA memcpy DtoH] 5.94% 153.49ms 95 1.6157ms 1.6120ms 1.6218ms volta_sgemm_128x64_nn 5.71% 147.61ms 95 1.5538ms 1.5445ms 1.5637ms volta_sgemm_128x64_nt 5.06% 130.83ms 95 1.3772ms 1.3723ms 1.3809ms volta_sgemm_32x128_tn # CUDA 9.2 mxnet-cu92mkl GPU activities: 82.17% 2.08856s 190 10.992ms 467.93us 26.238ms [CUDA memcpy DtoH] 5.99% 152.15ms 95 1.6016ms 1.5995ms 1.6043ms volta_sgemm_128x64_nn 5.70% 144.78ms 95 1.5240ms 1.5219ms 1.5288ms volta_sgemm_128x64_nt 4.98% 126.54ms 95 1.3320ms 1.3283ms 1.3399ms volta_sgemm_128x128_tn ------------------------------------------------------------------------------ data shape ('320,10000', '320,650', '10000,650') # CUDA 10.0 mxnet-cu100mkl GPU activities: 58.78% 186.02ms 190 979.07us 65.055us 6.4167ms [CUDA memcpy DtoH] 14.23% 45.045ms 95 474.16us 457.02us 492.99us volta_sgemm_128x64_nt 12.23% 38.704ms 95 407.41us 348.99us 540.25us volta_sgemm_128x128_tn 11.78% 37.298ms 95 392.61us 359.84us 424.28us volta_sgemm_128x64_nn # CUDA 9.2 mxnet-cu92mkl GPU activities: 62.66% 207.77ms 190 1.0935ms 65.087us 6.6399ms [CUDA memcpy DtoH] 11.83% 39.221ms 95 412.86us 349.63us 540.79us volta_sgemm_128x128_tn 11.48% 38.053ms 95 400.56us 360.64us 424.86us volta_sgemm_128x64_nn 11.14% 36.947ms 95 388.92us 354.17us 423.29us volta_sgemm_128x64_nt ------------------------------------------------------------------------------ data shape ('1920,10000', '1920,650', '10000,650') # CUDA 10.0 mxnet-cu100mkl GPU activities: 82.80% 3.29555s 190 17.345ms 664.22us 40.348ms [CUDA memcpy DtoH] 5.70% 227.01ms 95 2.3896ms 2.3852ms 2.3959ms volta_sgemm_128x64_nn 5.52% 219.56ms 95 2.3111ms 2.2841ms 2.3172ms volta_sgemm_128x64_nt 4.81% 191.49ms 95 2.0157ms 1.9679ms 2.0367ms volta_sgemm_128x64_tn # CUDA 9.2 mxnet-cu92mkl GPU activities: 80.50% 2.80315s 190 14.753ms 664.44us 36.470ms [CUDA memcpy DtoH] 6.52% 227.09ms 95 2.3904ms 2.3850ms 2.3956ms volta_sgemm_128x64_nn 6.21% 216.39ms 95 2.2778ms 2.2743ms 2.2854ms volta_sgemm_128x64_nt 5.44% 189.46ms 95 1.9943ms 1.9876ms 2.0082ms volta_sgemm_128x128_tn ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
