mahmoodn commented on issue #16001: Low kernel performance
URL: 
https://github.com/apache/incubator-mxnet/issues/16001#issuecomment-526295006
 
 
   OK I installed cuda-9.2 toolkit only in another path and changed PATH and 
LD_LIBRARY_PATH to catch that path.
   ```
   $ which nvprof
   /usr/local/cuda-9.2/bin/nvprof
   $ nvprof -o run.nvvp python text_cnn.py --num-epochs=1 --gpus=0
   Loading data...
   Train/Valid split: 9662/1000
   ('train shape:', (9662, 56))
   ('valid shape:', (1000, 56))
   ('sentence max words', 56)
   ('embedding size', 300)
   ('vocab size', 18765)
   [13:51:55] src/operator/tensor/./matrix_op-inl.h:166: Using target_shape 
will be deprecated.
   [13:51:55] src/operator/tensor/./matrix_op-inl.h:166: Using target_shape 
will be deprecated.
   ==7727== NVPROF is profiling process 7727, command: python text_cnn.py 
--num-epochs=1 --gpus=0
   ==7727== Warning: Profiling results might be incorrect with current version 
of nvcc compiler used to compile cuda app. Compile with nvcc compiler 9.0 or 
later version to get correct profiling results. Ignore this warning if code is 
already compiled with the recommended nvcc version
   [13:51:58] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running 
performance tests to find the best convolution algorithm, this can take a 
while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
   INFO:root:Epoch[0] Batch [50]   Speed: 7360.60 samples/sec      
accuracy=0.556863
   INFO:root:Epoch[0] Batch [100]  Speed: 8061.68 samples/sec      
accuracy=0.608800
   INFO:root:Epoch[0] Batch [150]  Speed: 8161.80 samples/sec      
accuracy=0.650800
   INFO:root:Epoch[0] Train-accuracy=0.664186
   INFO:root:Epoch[0] Time cost=1.391
   INFO:root:Epoch[0] Validation-accuracy=0.665000
   ==7727== Error: Internal profiling error 4054:34.
   ======== Error: CUDA profiling error.
   ```
   Although it ended up with an error, the generated output file is available 
at https://srv-file7.gofile.io/download/edLeC9/run.zip
   
   
   As you can see, `AddTakeGradLargeBatchKernel` is in the top place.
   
   Please see below that I installed mxnet-cuda9.2 from pip and then cloned the 
repo and checkedout 1.2.0. 
   
   ```
   $ pip install --user mxnet-cu92==1.2.0
   Collecting mxnet-cu92==1.2.0
     Downloading 
https://files.pythonhosted.org/packages/4d/20/970b134fcf4f783dc14d9cd43530bb79fe35922813c5066be05c28366398/mxnet_cu92-1.2.0-py2.py3-none-manylinux1_x86_64.whl
 (395.0MB)
       100% |████████████████████████████████| 395.0MB 3.7kB/s
   Collecting graphviz<0.9.0,>=0.8.1 (from mxnet-cu92==1.2.0)
     Using cached 
https://files.pythonhosted.org/packages/53/39/4ab213673844e0c004bed8a0781a0721a3f6bb23eb8854ee75c236428892/graphviz-0.8.4-py2.py3-none-any.whl
   Collecting numpy<1.15.0,>=1.8.2 (from mxnet-cu92==1.2.0)
     Downloading 
https://files.pythonhosted.org/packages/4e/5b/1077ec0ebfa06f42057e8315bc8e05f5978b6fd0f582879f35f4d62ff124/numpy-1.14.6-cp27-cp27mu-manylinux1_x86_64.whl
 (13.8MB)
       100% |████████████████████████████████| 13.8MB 100kB/s
   Collecting requests<2.19.0,>=2.18.4 (from mxnet-cu92==1.2.0)
     Downloading 
https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl
 (88kB)
       100% |████████████████████████████████| 92kB 1.6MB/s
   Collecting idna<2.7,>=2.5 (from requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0)
     Downloading 
https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl
 (56kB)
       100% |████████████████████████████████| 61kB 1.1MB/s
   Collecting urllib3<1.23,>=1.21.1 (from 
requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0)
     Downloading 
https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl
 (132kB)
       100% |████████████████████████████████| 133kB 5.5MB/s
   Collecting certifi>=2017.4.17 (from 
requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0)
     Using cached 
https://files.pythonhosted.org/packages/69/1b/b853c7a9d4f6a6d00749e94eb6f3a041e342a885b87340b79c1ef73e3a78/certifi-2019.6.16-py2.py3-none-any.whl
   Collecting chardet<3.1.0,>=3.0.2 (from 
requests<2.19.0,>=2.18.4->mxnet-cu92==1.2.0)
     Using cached 
https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
   Installing collected packages: graphviz, numpy, idna, urllib3, certifi, 
chardet, requests, mxnet-cu92
   Successfully installed certifi-2019.6.16 chardet-3.0.4 graphviz-0.8.4 
idna-2.8 mxnet-cu92-1.2.0 numpy-1.16.4 requests-2.22.0 urllib3-1.25.3
   
   
   
   mh.naderan@1080Ti:~/mx$ git clone 
https://github.com/apache/incubator-mxnet.git
   Cloning into 'incubator-mxnet'...
   remote: Enumerating objects: 8, done.
   remote: Counting objects: 100% (8/8), done.
   remote: Compressing objects: 100% (8/8), done.
   remote: Total 99016 (delta 0), reused 6 (delta 0), pack-reused 99008
   Receiving objects: 100% (99016/99016), 61.75 MiB | 7.56 MiB/s, done.
   Resolving deltas: 100% (65969/65969), done.
   mh.naderan@1080Ti:~/mx$ cd incubator-mxnet
   mh.naderan@1080Ti:~/mx/incubator-mxnet$ git checkout v1.2.0
   Branch 'v1.2.0' set up to track remote branch 'v1.2.0' from 'origin'.
   Switched to a new branch 'v1.2.0'
   mh.naderan@1080Ti:~/mx/incubator-mxnet$ git submodule update --init
   Submodule '3rdparty/cub' (https://github.com/dmlc/cub) registered for path 
'3rdparty/cub'
   Submodule '3rdparty/dlpack' (https://github.com/dmlc/dlpack) registered for 
path '3rdparty/dlpack'
   Submodule '3rdparty/dmlc-core' (https://github.com/dmlc/dmlc-core.git) 
registered for path '3rdparty/dmlc-core'
   Submodule '3rdparty/googletest' (https://github.com/google/googletest.git) 
registered for path '3rdparty/googletest'
   Submodule '3rdparty/mkldnn' (https://github.com/intel/mkl-dnn.git) 
registered for path '3rdparty/mkldnn'
   Submodule '3rdparty/mshadow' (https://github.com/dmlc/mshadow.git) 
registered for path '3rdparty/mshadow'
   Submodule '3rdparty/nnvm' (https://github.com/dmlc/nnvm) registered for path 
'3rdparty/nnvm'
   Submodule '3rdparty/openmp' (https://github.com/llvm-mirror/openmp) 
registered for path '3rdparty/openmp'
   Submodule '3rdparty/ps-lite' (https://github.com/dmlc/ps-lite) registered 
for path '3rdparty/ps-lite'
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/cub'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/dlpack'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/dmlc-core'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/googletest'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/mkldnn'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/mshadow'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/nnvm'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/openmp'...
   Cloning into '/home/mh.naderan/mx/incubator-mxnet/3rdparty/ps-lite'...
   Submodule path '3rdparty/cub': checked out 
'05eb57faa0a4cac37c2a86fdf4b4dc865a95a1a3'
   Submodule path '3rdparty/dlpack': checked out 
'10892ac964f1af7c81aae145cd3fab78bbccd297'
   Submodule path '3rdparty/dmlc-core': checked out 
'e9446f5a53cf5e61273deff7ce814093d2791766'
   Submodule path '3rdparty/googletest': checked out 
'ec44c6c1675c25b9827aacd08c02433cccde7780'
   Submodule path '3rdparty/mkldnn': checked out 
'f5218ff4fd2d16d13aada2e632afd18f2514fee3'
   Submodule path '3rdparty/mshadow': checked out 
'a8c650ce8a708608a282c4d1e251c57873a8db25'
   Submodule path '3rdparty/nnvm': checked out 
'0ca68e89ced69c0100aed32343cf30b45cafca7a'
   Submodule path '3rdparty/openmp': checked out 
'37c72127e90360a020f351f18d9cccfc30e5145a'
   Submodule path '3rdparty/ps-lite': checked out 
'a6dda54604a07d1fb21b016ed1e3f4246b08222a'
   
   
   
   
   
   
   mh.naderan@1080Ti:~/mx/incubator-mxnet/example/cnn_text_classification$ pip 
list | grep mxnet
   DEPRECATION: The default format will switch to columns in the future. You 
can use --format=(legacy|columns) (or define a format=(legacy|columns) in your 
pip.conf under the [list] section) to disable this warning.
   mxnet-cu92 (1.2.0)
   ```
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to