mahmoodn commented on issue #16001: Low kernel performance URL: https://github.com/apache/incubator-mxnet/issues/16001#issuecomment-526295006 OK I installed cuda-9.2 toolkit only in another path and changed PATH and LD_LIBRARY_PATH to catch that path. ``` $ which nvprof /usr/local/cuda-9.2/bin/nvprof $ nvprof -o run.nvvp python text_cnn.py --num-epochs=1 --gpus=0 Loading data... Train/Valid split: 9662/1000 ('train shape:', (9662, 56)) ('valid shape:', (1000, 56)) ('sentence max words', 56) ('embedding size', 300) ('vocab size', 18765) [13:51:55] src/operator/tensor/./matrix_op-inl.h:166: Using target_shape will be deprecated. [13:51:55] src/operator/tensor/./matrix_op-inl.h:166: Using target_shape will be deprecated. ==7727== NVPROF is profiling process 7727, command: python text_cnn.py --num-epochs=1 --gpus=0 ==7727== Warning: Profiling results might be incorrect with current version of nvcc compiler used to compile cuda app. Compile with nvcc compiler 9.0 or later version to get correct profiling results. Ignore this warning if code is already compiled with the recommended nvcc version [13:51:58] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable) INFO:root:Epoch[0] Batch [50] Speed: 7360.60 samples/sec accuracy=0.556863 INFO:root:Epoch[0] Batch [100] Speed: 8061.68 samples/sec accuracy=0.608800 INFO:root:Epoch[0] Batch [150] Speed: 8161.80 samples/sec accuracy=0.650800 INFO:root:Epoch[0] Train-accuracy=0.664186 INFO:root:Epoch[0] Time cost=1.391 INFO:root:Epoch[0] Validation-accuracy=0.665000 ==7727== Error: Internal profiling error 4054:34. ======== Error: CUDA profiling error. ``` Although it ended up with an error, the generated output file is available at https://srv-file7.gofile.io/download/edLeC9/run.zip As you can see, `AddTakeGradLargeBatchKernel` is in the top place. Please see below that I installed mxnet-cuda9.2 from pip and then cloned the repo and checkedout 1.2.0. ``` $ pip install --user mxnet-cu92==1.2.0 Collecting mxnet-cu92==1.2.0 Downloading https://files.pythonhosted.org/packages/4d/20/970b134fcf4f783dc14d9cd43530bb79fe35922813c5066be05c28366398/mxnet_cu92-1.2.0-py2.py3-none-manylinux1_x86_64.whl (395.0MB) 100% |████████████████████████████████| 395.0MB 3.7kB/s Collecting graphviz<0.9.0,>=0.8.1 (from mxnet-cu92==1.2.0) Using cached https://files.pythonhosted.org/packages/53/39/4ab213673844e0c004bed8a0781a0721a3f6bb23eb8854ee75c236428892/graphviz-0.8.4-py2.py3-none-any.whl Collecting numpy<1.15.0,>=1.8.2 (from mxnet-cu92==1.2.0) Downloading https://files.pythonhosted.org/packages/4e/5b/1077ec0ebfa06f42057e8315bc8e05f5978b6fd0f582879f35f4d62ff124/numpy-1.14.6-cp27-cp27mu-manylinux1_x86_64.whl (13.8MB) 100% |████████████████████████████████| 13.8MB 100kB/s Collecting requests<2.19.0,>=2.18.4 (from mxnet-cu92==1.2.0) Downloading https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl (88kB) 100% |████████████████████████████████| 92kB 1.6MB/s Collecting idna<2.7,>=2.5 (from requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0) Downloading https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl (56kB) 100% |████████████████████████████████| 61kB 1.1MB/s Collecting urllib3<1.23,>=1.21.1 (from requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0) Downloading https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl (132kB) 100% |████████████████████████████████| 133kB 5.5MB/s Collecting certifi>=2017.4.17 (from requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0) Using cached https://files.pythonhosted.org/packages/69/1b/b853c7a9d4f6a6d00749e94eb6f3a041e342a885b87340b79c1ef73e3a78/certifi-2019.6.16-py2.py3-none-any.whl Collecting chardet<3.1.0,>=3.0.2 (from requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0) Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl Installing collected packages: graphviz, numpy, idna, urllib3, certifi, chardet, requests, mxnet-cu92 Successfully installed certifi-2019.6.16 chardet-3.0.4 graphviz-0.8.4 idna-2.8 mxnet-cu92-1.2.0 numpy-1.16.4 requests-2.22.0 urllib3-1.25.3 mh.naderan@1080Ti:~/mx$ git clone https://github.com/apache/incubator-mxnet.git Cloning into 'incubator-mxnet'... remote: Enumerating objects: 8, done. remote: Counting objects: 100% (8/8), done. remote: Compressing objects: 100% (8/8), done. remote: Total 99016 (delta 0), reused 6 (delta 0), pack-reused 99008 Receiving objects: 100% (99016/99016), 61.75 MiB | 7.56 MiB/s, done. Resolving deltas: 100% (65969/65969), done. mh.naderan@1080Ti:~/mx$ cd incubator-mxnet mh.naderan@1080Ti:~/mx/incubator-mxnet$ git checkout v1.2.0 Branch 'v1.2.0' set up to track remote branch 'v1.2.0' from 'origin'. Switched to a new branch 'v1.2.0' mh.naderan@1080Ti:~/mx/incubator-mxnet$ git submodule update --init Submodule '3rdparty/cub' (https://github.com/dmlc/cub) registered for path '3rdparty/cub' Submodule '3rdparty/dlpack' (https://github.com/dmlc/dlpack) registered for path '3rdparty/dlpack' Submodule '3rdparty/dmlc-core' (https://github.com/dmlc/dmlc-core.git) registered for path '3rdparty/dmlc-core' Submodule '3rdparty/googletest' (https://github.com/google/googletest.git) registered for path '3rdparty/googletest' Submodule '3rdparty/mkldnn' (https://github.com/intel/mkl-dnn.git) registered for path '3rdparty/mkldnn' Submodule '3rdparty/mshadow' (https://github.com/dmlc/mshadow.git) registered for path '3rdparty/mshadow' Submodule '3rdparty/nnvm' (https://github.com/dmlc/nnvm) registered for path '3rdparty/nnvm' Submodule '3rdparty/openmp' (https://github.com/llvm-mirror/openmp) registered for path '3rdparty/openmp' Submodule '3rdparty/ps-lite' (https://github.com/dmlc/ps-lite) registered for path '3rdparty/ps-lite' Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/cub'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/dlpack'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/dmlc-core'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/googletest'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/mkldnn'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/mshadow'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/nnvm'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/openmp'... Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/ps-lite'... Submodule path '3rdparty/cub': checked out '05eb57faa0a4cac37c2a86fdf4b4dc865a95a1a3' Submodule path '3rdparty/dlpack': checked out '10892ac964f1af7c81aae145cd3fab78bbccd297' Submodule path '3rdparty/dmlc-core': checked out 'e9446f5a53cf5e61273deff7ce814093d2791766' Submodule path '3rdparty/googletest': checked out 'ec44c6c1675c25b9827aacd08c02433cccde7780' Submodule path '3rdparty/mkldnn': checked out 'f5218ff4fd2d16d13aada2e632afd18f2514fee3' Submodule path '3rdparty/mshadow': checked out 'a8c650ce8a708608a282c4d1e251c57873a8db25' Submodule path '3rdparty/nnvm': checked out '0ca68e89ced69c0100aed32343cf30b45cafca7a' Submodule path '3rdparty/openmp': checked out '37c72127e90360a020f351f18d9cccfc30e5145a' Submodule path '3rdparty/ps-lite': checked out 'a6dda54604a07d1fb21b016ed1e3f4246b08222a' mh.naderan@1080Ti:~/mx/incubator-mxnet/example/cnn_text_classification$ pip list | grep mxnet DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning. mxnet-cu92 (1.2.0) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
