mahmoodn commented on issue #16001: Low kernel performance
URL: 
https://github.com/apache/incubator-mxnet/issues/16001#issuecomment-524808879
 
 
   Yes. I used a git version back in April or May and ran it on M2000 with 
cuda-10.
   Anyway, I tried one more time and here are the full details. I appreciate 
any feedback.
   
   
   clone
   ```
   $ git clone --recursive https://github.com/apache/incubator-mxnet mxnet
   Cloning into 'mxnet'...
   remote: Enumerating objects: 23, done.
   remote: Counting objects: 100% (23/23), done.
   remote: Compressing objects: 100% (23/23), done.
   remote: Total 98735 (delta 8), reused 7 (delta 0), pack-reused 98712
   Receiving objects: 100% (98735/98735), 61.52 MiB | 327.00 KiB/s, done.
   Resolving deltas: 100% (65738/65738), done.
   Submodule '3rdparty/dlpack' (https://github.com/dmlc/dlpack) registered for 
path '3rdparty/dlpack'
   Submodule '3rdparty/dmlc-core' (https://github.com/dmlc/dmlc-core.git) 
registered for path '3rdparty/dmlc-core'
   Submodule '3rdparty/googletest' (https://github.com/google/googletest.git) 
registered for path '3rdparty/googletest'
   Submodule '3rdparty/mkldnn' (https://github.com/intel/mkl-dnn.git) 
registered for path '3rdparty/mkldnn'
   Submodule '3rdparty/nvidia_cub' (https://github.com/NVlabs/cub.git) 
registered for path '3rdparty/nvidia_cub'
   Submodule '3rdparty/onnx-tensorrt' 
(https://github.com/onnx/onnx-tensorrt.git) registered for path 
'3rdparty/onnx-tensorrt'
   Submodule '3rdparty/openmp' (https://github.com/llvm-mirror/openmp) 
registered for path '3rdparty/openmp'
   Submodule '3rdparty/ps-lite' (https://github.com/dmlc/ps-lite) registered 
for path '3rdparty/ps-lite'
   Submodule '3rdparty/tvm' (https://github.com/dmlc/tvm) registered for path 
'3rdparty/tvm'
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/dlpack'...
   remote: Enumerating objects: 22, done.
   remote: Counting objects: 100% (22/22), done.
   remote: Compressing objects: 100% (15/15), done.
   remote: Total 162 (delta 5), reused 12 (delta 3), pack-reused 140
   Receiving objects: 100% (162/162), 60.51 KiB | 512.00 KiB/s, done.
   Resolving deltas: 100% (52/52), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/dmlc-core'...
   remote: Enumerating objects: 26, done.
   remote: Counting objects: 100% (26/26), done.
   remote: Compressing objects: 100% (21/21), done.
   remote: Total 5765 (delta 5), reused 15 (delta 3), pack-reused 5739
   Receiving objects: 100% (5765/5765), 1.45 MiB | 659.00 KiB/s, done.
   Resolving deltas: 100% (3491/3491), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/googletest'...
   remote: Enumerating objects: 17947, done.
   remote: Total 17947 (delta 0), reused 0 (delta 0), pack-reused 17947
   Receiving objects: 100% (17947/17947), 6.31 MiB | 423.00 KiB/s, done.
   Resolving deltas: 100% (13237/13237), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/mkldnn'...
   remote: Enumerating objects: 4655, done.
   remote: Counting objects: 100% (4655/4655), done.
   remote: Compressing objects: 100% (2024/2024), done.
   remote: Total 57986 (delta 3289), reused 3290 (delta 2443), pack-reused 53331
   Receiving objects: 100% (57986/57986), 67.84 MiB | 416.00 KiB/s, done.
   Resolving deltas: 100% (46973/46973), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/nvidia_cub'...
   remote: Enumerating objects: 4, done.
   remote: Counting objects: 100% (4/4), done.
   remote: Compressing objects: 100% (4/4), done.
   remote: Total 32675 (delta 0), reused 4 (delta 0), pack-reused 32671
   Receiving objects: 100% (32675/32675), 16.54 MiB | 407.00 KiB/s, done.
   Resolving deltas: 100% (28644/28644), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt'...
   remote: Enumerating objects: 4, done.
   remote: Counting objects: 100% (4/4), done.
   remote: Compressing objects: 100% (4/4), done.
   remote: Total 593 (delta 0), reused 2 (delta 0), pack-reused 589
   Receiving objects: 100% (593/593), 1.35 MiB | 462.00 KiB/s, done.
   Resolving deltas: 100% (381/381), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/openmp'...
   remote: Enumerating objects: 174, done.
   remote: Counting objects: 100% (174/174), done.
   remote: Compressing objects: 100% (123/123), done.
   remote: Total 10366 (delta 81), reused 110 (delta 48), pack-reused 10192
   Receiving objects: 100% (10366/10366), 10.62 MiB | 385.00 KiB/s, done.
   Resolving deltas: 100% (7611/7611), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/ps-lite'...
   remote: Enumerating objects: 165, done.
   remote: Counting objects: 100% (165/165), done.
   remote: Compressing objects: 100% (121/121), done.
   remote: Total 2359 (delta 97), reused 55 (delta 44), pack-reused 2194
   Receiving objects: 100% (2359/2359), 737.52 KiB | 478.00 KiB/s, done.
   Resolving deltas: 100% (1517/1517), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm'...
   remote: Enumerating objects: 46164, done.
   remote: Total 46164 (delta 0), reused 0 (delta 0), pack-reused 46164
   Receiving objects: 100% (46164/46164), 15.87 MiB | 377.00 KiB/s, done.
   Resolving deltas: 100% (31267/31267), done.
   Submodule path '3rdparty/dlpack': checked out 
'b90e939072066c160b18ea1e7156537b8d3710f6'
   Submodule path '3rdparty/dmlc-core': checked out 
'f1ff6cc117f4e95169a9f62be549c8fe3e15c20f'
   Submodule path '3rdparty/googletest': checked out 
'eb9225ce361affe561592e0912320b9db84985d0'
   Submodule path '3rdparty/mkldnn': checked out 
'd89bf4babd7cce7efa6613387dca79c123164084'
   Submodule path '3rdparty/nvidia_cub': checked out 
'c3cceac115c072fb63df1836ff46d8c60d9eb304'
   Submodule path '3rdparty/onnx-tensorrt': checked out 
'1e209e546061173ccc37b25bbca69a795c6c86e4'
   Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered 
for path '3rdparty/onnx-tensorrt/third_party/onnx'
   Cloning into 
'/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx'...
   remote: Enumerating objects: 18449, done.
   remote: Total 18449 (delta 0), reused 0 (delta 0), pack-reused 18449
   Receiving objects: 100% (18449/18449), 9.52 MiB | 341.00 KiB/s, done.
   Resolving deltas: 100% (10031/10031), done.
   Submodule path '3rdparty/onnx-tensorrt/third_party/onnx': checked out 
'765f5ee823a67a866f4bd28a9860e81f3c811ce8'
   Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) 
registered for path 
'3rdparty/onnx-tensorrt/third_party/onnx/third_party/benchmark'
   Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) 
registered for path 
'3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11'
   Cloning into 
'/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party/benchmark'...
   remote: Enumerating objects: 26, done.
   remote: Counting objects: 100% (26/26), done.
   remote: Compressing objects: 100% (25/25), done.
   remote: Total 5262 (delta 11), reused 5 (delta 1), pack-reused 5236
   Receiving objects: 100% (5262/5262), 1.68 MiB | 373.00 KiB/s, done.
   Resolving deltas: 100% (3458/3458), done.
   Cloning into 
'/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11'...
   remote: Enumerating objects: 10987, done.
   remote: Total 10987 (delta 0), reused 0 (delta 0), pack-reused 10987
   Receiving objects: 100% (10987/10987), 4.02 MiB | 272.00 KiB/s, done.
   Resolving deltas: 100% (7426/7426), done.
   Submodule path 
'3rdparty/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 
'e776aa0275e293707b6a0901e0e8d8a8a3679508'
   Submodule path 
'3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 
'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c'
   Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) 
registered for path 
'3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
   Cloning into 
'/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'...
   remote: Enumerating objects: 353, done.
   remote: Total 353 (delta 0), reused 0 (delta 0), pack-reused 353
   Receiving objects: 100% (353/353), 119.74 KiB | 273.00 KiB/s, done.
   Resolving deltas: 100% (149/149), done.
   Submodule path 
'3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': 
checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
   Submodule path '3rdparty/openmp': checked out 
'37c72127e90360a020f351f18d9cccfc30e5145a'
   Submodule path '3rdparty/ps-lite': checked out 
'8a763892a973afc1acd3d4b469d05bb338a83a6e'
   Submodule path '3rdparty/tvm': checked out 
'afd4b3e4450984358e9d79a7e8e578483cb7b017'
   Submodule 'dlpack' (https://github.com/dmlc/dlpack) registered for path 
'3rdparty/tvm/3rdparty/dlpack'
   Submodule 'dmlc-core' (https://github.com/dmlc/dmlc-core) registered for 
path '3rdparty/tvm/3rdparty/dmlc-core'
   Submodule '3rdparty/rang' (https://github.com/agauniyal/rang) registered for 
path '3rdparty/tvm/3rdparty/rang'
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm/3rdparty/dlpack'...
   remote: Enumerating objects: 22, done.
   remote: Counting objects: 100% (22/22), done.
   remote: Compressing objects: 100% (15/15), done.
   remote: Total 162 (delta 5), reused 12 (delta 3), pack-reused 140
   Receiving objects: 100% (162/162), 60.51 KiB | 211.00 KiB/s, done.
   Resolving deltas: 100% (52/52), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm/3rdparty/dmlc-core'...
   remote: Enumerating objects: 26, done.
   remote: Counting objects: 100% (26/26), done.
   remote: Compressing objects: 100% (21/21), done.
   remote: Total 5765 (delta 5), reused 15 (delta 3), pack-reused 5739
   Receiving objects: 100% (5765/5765), 1.45 MiB | 508.00 KiB/s, done.
   Resolving deltas: 100% (3491/3491), done.
   Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm/3rdparty/rang'...
   remote: Enumerating objects: 704, done.
   remote: Total 704 (delta 0), reused 0 (delta 0), pack-reused 704
   Receiving objects: 100% (704/704), 256.14 KiB | 488.00 KiB/s, done.
   Resolving deltas: 100% (362/362), done.
   Submodule path '3rdparty/tvm/3rdparty/dlpack': checked out 
'0acb731e0e43d15deee27b66f10e4c5b4e667913'
   Submodule path '3rdparty/tvm/3rdparty/dmlc-core': checked out 
'3943914eed66470bd010df581e29e4dca4f7df6f'
   Submodule path '3rdparty/tvm/3rdparty/rang': checked out 
'cabe04d6d6b05356fa8f9741704924788f0dd762'
   ```
   
   make
   ```
   make -j 2 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 
USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1
   ```
   Full output is available 
[here](https://srv-file6.gofile.io/download/hgIVBA/nohup.zip).
   
   Install
   ```$ pip install --user -e .
   Obtaining file:///home/mh.naderan/mx/mxnet/python
   Collecting numpy<2.0.0,>1.16.0 (from mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/1f/c7/198496417c9c2f6226616cff7dedf2115a4f4d0276613bab842ec8ac1e23/numpy-1.16.4-cp27-cp27mu-manylinux1_x86_64.whl
   Collecting requests<3,>=2.20.0 (from mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl
   Collecting graphviz<0.9.0,>=0.8.1 (from mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/53/39/4ab213673844e0c004bed8a0781a0721a3f6bb23eb8854ee75c236428892/graphviz-0.8.4-py2.py3-none-any.whl
   Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from 
requests<3,>=2.20.0->mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/e6/60/247f23a7121ae632d62811ba7f273d0e58972d75e58a94d329d51550a47d/urllib3-1.25.3-py2.py3-none-any.whl
   Collecting certifi>=2017.4.17 (from requests<3,>=2.20.0->mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/69/1b/b853c7a9d4f6a6d00749e94eb6f3a041e342a885b87340b79c1ef73e3a78/certifi-2019.6.16-py2.py3-none-any.whl
   Collecting chardet<3.1.0,>=3.0.2 (from requests<3,>=2.20.0->mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
   Collecting idna<2.9,>=2.5 (from requests<3,>=2.20.0->mxnet==1.6.0)
     Using cached 
https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl
   Installing collected packages: numpy, urllib3, certifi, chardet, idna, 
requests, graphviz, mxnet
     Running setup.py develop for mxnet
   Successfully installed certifi-2019.6.16 chardet-3.0.4 graphviz-0.8.4 
idna-2.8 mxnet numpy-1.16.4 requests-2.22.0 urllib3-1.25.3
   
   ```
   
   Run command
   ```
   $ cd ../example/cnn_text_classification/
   $ nvprof -o run.visual.profiler.nvvp python text_cnn.py --num-epochs=1 
--gpus=0
   ```
   
   Please download the output file from 
[here](https://srv-file6.gofile.io/download/hgIVBA/run.visual.profiler.nvvp.gz) 
and open it with visual profiler.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to