mahmoodn commented on issue #16001: Low kernel performance URL: https://github.com/apache/incubator-mxnet/issues/16001#issuecomment-524808879 Yes. I used a git version back in April or May and ran it on M2000 with cuda-10. Anyway, I tried one more time and here are the full details. I appreciate any feedback. clone ``` $ git clone --recursive https://github.com/apache/incubator-mxnet mxnet Cloning into 'mxnet'... remote: Enumerating objects: 23, done. remote: Counting objects: 100% (23/23), done. remote: Compressing objects: 100% (23/23), done. remote: Total 98735 (delta 8), reused 7 (delta 0), pack-reused 98712 Receiving objects: 100% (98735/98735), 61.52 MiB | 327.00 KiB/s, done. Resolving deltas: 100% (65738/65738), done. Submodule '3rdparty/dlpack' (https://github.com/dmlc/dlpack) registered for path '3rdparty/dlpack' Submodule '3rdparty/dmlc-core' (https://github.com/dmlc/dmlc-core.git) registered for path '3rdparty/dmlc-core' Submodule '3rdparty/googletest' (https://github.com/google/googletest.git) registered for path '3rdparty/googletest' Submodule '3rdparty/mkldnn' (https://github.com/intel/mkl-dnn.git) registered for path '3rdparty/mkldnn' Submodule '3rdparty/nvidia_cub' (https://github.com/NVlabs/cub.git) registered for path '3rdparty/nvidia_cub' Submodule '3rdparty/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt.git) registered for path '3rdparty/onnx-tensorrt' Submodule '3rdparty/openmp' (https://github.com/llvm-mirror/openmp) registered for path '3rdparty/openmp' Submodule '3rdparty/ps-lite' (https://github.com/dmlc/ps-lite) registered for path '3rdparty/ps-lite' Submodule '3rdparty/tvm' (https://github.com/dmlc/tvm) registered for path '3rdparty/tvm' Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/dlpack'... remote: Enumerating objects: 22, done. remote: Counting objects: 100% (22/22), done. remote: Compressing objects: 100% (15/15), done. remote: Total 162 (delta 5), reused 12 (delta 3), pack-reused 140 Receiving objects: 100% (162/162), 60.51 KiB | 512.00 KiB/s, done. Resolving deltas: 100% (52/52), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/dmlc-core'... remote: Enumerating objects: 26, done. remote: Counting objects: 100% (26/26), done. remote: Compressing objects: 100% (21/21), done. remote: Total 5765 (delta 5), reused 15 (delta 3), pack-reused 5739 Receiving objects: 100% (5765/5765), 1.45 MiB | 659.00 KiB/s, done. Resolving deltas: 100% (3491/3491), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/googletest'... remote: Enumerating objects: 17947, done. remote: Total 17947 (delta 0), reused 0 (delta 0), pack-reused 17947 Receiving objects: 100% (17947/17947), 6.31 MiB | 423.00 KiB/s, done. Resolving deltas: 100% (13237/13237), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/mkldnn'... remote: Enumerating objects: 4655, done. remote: Counting objects: 100% (4655/4655), done. remote: Compressing objects: 100% (2024/2024), done. remote: Total 57986 (delta 3289), reused 3290 (delta 2443), pack-reused 53331 Receiving objects: 100% (57986/57986), 67.84 MiB | 416.00 KiB/s, done. Resolving deltas: 100% (46973/46973), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/nvidia_cub'... remote: Enumerating objects: 4, done. remote: Counting objects: 100% (4/4), done. remote: Compressing objects: 100% (4/4), done. remote: Total 32675 (delta 0), reused 4 (delta 0), pack-reused 32671 Receiving objects: 100% (32675/32675), 16.54 MiB | 407.00 KiB/s, done. Resolving deltas: 100% (28644/28644), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt'... remote: Enumerating objects: 4, done. remote: Counting objects: 100% (4/4), done. remote: Compressing objects: 100% (4/4), done. remote: Total 593 (delta 0), reused 2 (delta 0), pack-reused 589 Receiving objects: 100% (593/593), 1.35 MiB | 462.00 KiB/s, done. Resolving deltas: 100% (381/381), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/openmp'... remote: Enumerating objects: 174, done. remote: Counting objects: 100% (174/174), done. remote: Compressing objects: 100% (123/123), done. remote: Total 10366 (delta 81), reused 110 (delta 48), pack-reused 10192 Receiving objects: 100% (10366/10366), 10.62 MiB | 385.00 KiB/s, done. Resolving deltas: 100% (7611/7611), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/ps-lite'... remote: Enumerating objects: 165, done. remote: Counting objects: 100% (165/165), done. remote: Compressing objects: 100% (121/121), done. remote: Total 2359 (delta 97), reused 55 (delta 44), pack-reused 2194 Receiving objects: 100% (2359/2359), 737.52 KiB | 478.00 KiB/s, done. Resolving deltas: 100% (1517/1517), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm'... remote: Enumerating objects: 46164, done. remote: Total 46164 (delta 0), reused 0 (delta 0), pack-reused 46164 Receiving objects: 100% (46164/46164), 15.87 MiB | 377.00 KiB/s, done. Resolving deltas: 100% (31267/31267), done. Submodule path '3rdparty/dlpack': checked out 'b90e939072066c160b18ea1e7156537b8d3710f6' Submodule path '3rdparty/dmlc-core': checked out 'f1ff6cc117f4e95169a9f62be549c8fe3e15c20f' Submodule path '3rdparty/googletest': checked out 'eb9225ce361affe561592e0912320b9db84985d0' Submodule path '3rdparty/mkldnn': checked out 'd89bf4babd7cce7efa6613387dca79c123164084' Submodule path '3rdparty/nvidia_cub': checked out 'c3cceac115c072fb63df1836ff46d8c60d9eb304' Submodule path '3rdparty/onnx-tensorrt': checked out '1e209e546061173ccc37b25bbca69a795c6c86e4' Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path '3rdparty/onnx-tensorrt/third_party/onnx' Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx'... remote: Enumerating objects: 18449, done. remote: Total 18449 (delta 0), reused 0 (delta 0), pack-reused 18449 Receiving objects: 100% (18449/18449), 9.52 MiB | 341.00 KiB/s, done. Resolving deltas: 100% (10031/10031), done. Submodule path '3rdparty/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path '3rdparty/onnx-tensorrt/third_party/onnx/third_party/benchmark' Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path '3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11' Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party/benchmark'... remote: Enumerating objects: 26, done. remote: Counting objects: 100% (26/26), done. remote: Compressing objects: 100% (25/25), done. remote: Total 5262 (delta 11), reused 5 (delta 1), pack-reused 5236 Receiving objects: 100% (5262/5262), 1.68 MiB | 373.00 KiB/s, done. Resolving deltas: 100% (3458/3458), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11'... remote: Enumerating objects: 10987, done. remote: Total 10987 (delta 0), reused 0 (delta 0), pack-reused 10987 Receiving objects: 100% (10987/10987), 4.02 MiB | 272.00 KiB/s, done. Resolving deltas: 100% (7426/7426), done. Submodule path '3rdparty/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' Submodule path '3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path '3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... remote: Enumerating objects: 353, done. remote: Total 353 (delta 0), reused 0 (delta 0), pack-reused 353 Receiving objects: 100% (353/353), 119.74 KiB | 273.00 KiB/s, done. Resolving deltas: 100% (149/149), done. Submodule path '3rdparty/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' Submodule path '3rdparty/openmp': checked out '37c72127e90360a020f351f18d9cccfc30e5145a' Submodule path '3rdparty/ps-lite': checked out '8a763892a973afc1acd3d4b469d05bb338a83a6e' Submodule path '3rdparty/tvm': checked out 'afd4b3e4450984358e9d79a7e8e578483cb7b017' Submodule 'dlpack' (https://github.com/dmlc/dlpack) registered for path '3rdparty/tvm/3rdparty/dlpack' Submodule 'dmlc-core' (https://github.com/dmlc/dmlc-core) registered for path '3rdparty/tvm/3rdparty/dmlc-core' Submodule '3rdparty/rang' (https://github.com/agauniyal/rang) registered for path '3rdparty/tvm/3rdparty/rang' Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm/3rdparty/dlpack'... remote: Enumerating objects: 22, done. remote: Counting objects: 100% (22/22), done. remote: Compressing objects: 100% (15/15), done. remote: Total 162 (delta 5), reused 12 (delta 3), pack-reused 140 Receiving objects: 100% (162/162), 60.51 KiB | 211.00 KiB/s, done. Resolving deltas: 100% (52/52), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm/3rdparty/dmlc-core'... remote: Enumerating objects: 26, done. remote: Counting objects: 100% (26/26), done. remote: Compressing objects: 100% (21/21), done. remote: Total 5765 (delta 5), reused 15 (delta 3), pack-reused 5739 Receiving objects: 100% (5765/5765), 1.45 MiB | 508.00 KiB/s, done. Resolving deltas: 100% (3491/3491), done. Cloning into '/home/mh.naderan/mx/mxnet/3rdparty/tvm/3rdparty/rang'... remote: Enumerating objects: 704, done. remote: Total 704 (delta 0), reused 0 (delta 0), pack-reused 704 Receiving objects: 100% (704/704), 256.14 KiB | 488.00 KiB/s, done. Resolving deltas: 100% (362/362), done. Submodule path '3rdparty/tvm/3rdparty/dlpack': checked out '0acb731e0e43d15deee27b66f10e4c5b4e667913' Submodule path '3rdparty/tvm/3rdparty/dmlc-core': checked out '3943914eed66470bd010df581e29e4dca4f7df6f' Submodule path '3rdparty/tvm/3rdparty/rang': checked out 'cabe04d6d6b05356fa8f9741704924788f0dd762' ``` make ``` make -j 2 USE_OPENCV=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 ``` Full output is available [here](https://srv-file6.gofile.io/download/hgIVBA/nohup.zip). Install ```$ pip install --user -e . Obtaining file:///home/mh.naderan/mx/mxnet/python Collecting numpy<2.0.0,>1.16.0 (from mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/1f/c7/198496417c9c2f6226616cff7dedf2115a4f4d0276613bab842ec8ac1e23/numpy-1.16.4-cp27-cp27mu-manylinux1_x86_64.whl Collecting requests<3,>=2.20.0 (from mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/51/bd/23c926cd341ea6b7dd0b2a00aba99ae0f828be89d72b2190f27c11d4b7fb/requests-2.22.0-py2.py3-none-any.whl Collecting graphviz<0.9.0,>=0.8.1 (from mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/53/39/4ab213673844e0c004bed8a0781a0721a3f6bb23eb8854ee75c236428892/graphviz-0.8.4-py2.py3-none-any.whl Collecting urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 (from requests<3,>=2.20.0->mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/e6/60/247f23a7121ae632d62811ba7f273d0e58972d75e58a94d329d51550a47d/urllib3-1.25.3-py2.py3-none-any.whl Collecting certifi>=2017.4.17 (from requests<3,>=2.20.0->mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/69/1b/b853c7a9d4f6a6d00749e94eb6f3a041e342a885b87340b79c1ef73e3a78/certifi-2019.6.16-py2.py3-none-any.whl Collecting chardet<3.1.0,>=3.0.2 (from requests<3,>=2.20.0->mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl Collecting idna<2.9,>=2.5 (from requests<3,>=2.20.0->mxnet==1.6.0) Using cached https://files.pythonhosted.org/packages/14/2c/cd551d81dbe15200be1cf41cd03869a46fe7226e7450af7a6545bfc474c9/idna-2.8-py2.py3-none-any.whl Installing collected packages: numpy, urllib3, certifi, chardet, idna, requests, graphviz, mxnet Running setup.py develop for mxnet Successfully installed certifi-2019.6.16 chardet-3.0.4 graphviz-0.8.4 idna-2.8 mxnet numpy-1.16.4 requests-2.22.0 urllib3-1.25.3 ``` Run command ``` $ cd ../example/cnn_text_classification/ $ nvprof -o run.visual.profiler.nvvp python text_cnn.py --num-epochs=1 --gpus=0 ``` Please download the output file from [here](https://srv-file6.gofile.io/download/hgIVBA/run.visual.profiler.nvvp.gz) and open it with visual profiler.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
