kpuatamazon opened a new issue #18699:
URL: https://github.com/apache/incubator-mxnet/issues/18699


   I'm experiencing a 4% slowdown in due to commit 83b51703e "Add simplified 
HybridBlock.forward without F (#17530)".  
   
   | Commit | Time (s) |
   | -- | -- |
   | 08528c5cb9449e634f55cb72ceb2b1910d176f80 | 295.90 |
   | 56e79853ad5cf98baf84454eb595c7658bef6ee6 | 295.73 |
   | d4052fde4a94c8a70f805fe9b44980125afa8686 | 295.69 |
   | 9a355ebc1dfc5c087d41bb24c946e7f773e01af9 | 295.28 |
   | 3840786a25b16d0cfe6411e26f25aba8d3b574ff | 293.25 |
   | 83b51703ed354f41024423f140de38df2ba22d50 | 293.70 |
   | 8e3951876b3598c8b52606a467add5f239d88b38 | 281.37 |
   | b1338999c1289266487237c6926b79db8bf2fd6c | 282.58 |
   | 2f358fdc3129f02d8f83775bf1bed61d750f76b6 | 281.95 |
   | f01dc80f030d2d1912c8e134c95f373e9f1f8e7b | 283.60 |
   | 3667e9a2056e5ae6b94ba4f61675137012d96e82 | 283.79 |
   | f7c43234d08d4b3a9401f2d5ffc1a98795765ad5 | 282.60 |
   
   But it's slightly more complicated.  At the beginning (f7c432), the build 
worked with MKLDNN at cb2cc7ac.  Then  3667e9a2056e5ae6b94ba4f61675137012d96e82 
broke the build with an MKLDNN upgrade, a bunch of commits went in with MKLDNN 
broken so they don't compile, and 08528c5cb9449e634f55cb72ceb2b1910d176f80 
fixed it by downgrading MKL back to cb2cc7ac.  
   
   Hence I wrote this script that downgrades MKLDNN to make stuff build and 
find the relevant commit:
   ```bash
   #!/bin/bash
   export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so
   export CXXFLAGS="-O3 -march=native -DUSE_MKL -I/opt/intel/mkl/include -pipe"
   set -e -o pipefail
   . ~/test/bin/activate
   cd ~/mxnet
   git reset --hard
   git checkout --force $1
   git clean -xdff
   git reset --hard
   git submodule foreach --recursive git clean -ffxd
   git submodule foreach --recursive git reset --hard
   git submodule update --init --recursive
   cd 3rdparty/mkldnn/
   git checkout cb2cc7ac17ff4e2ef50805c7048d33256d82be4d
   cd ../..
   rm -rf build
   mkdir build
   cd build
   cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release ..
   ninja -j 4
   cd ../python
   pip3 install -e .
   ~/benchmark.sh
   ```
   
   Test conditions:
   - c5.2xlarge specifically a "Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz"
   - `OMP_NUM_THREADS=3`
   - Forced MKL backend with `export CXXFLAGS="-O3 -march=native -DUSE_MKL 
-I/opt/intel/mkl/include -pipe"`
   - Sockeye 45d704a4
   - Batch size 1
   
   More broadly, I'm trying to unpick performance differences seen in Sockeye 
as MXNet has changed since v1.5.x.  This image shows commits since master 
diverged from v1.5.x.  v1.5.x is on the left and 
cbbb864005c5dc5979d7fb7a849f35d0da9b55fd is on the right.  
   
   
![master](https://user-images.githubusercontent.com/56725192/87342592-69661580-c543-11ea-85d1-36a4c8675ab1.png)
   
   The first big slowdown is an MKLDNN change on the left but that appears to 
have been fixed.  Then there's a slowdown near the right that doesn't appear to 
be a single commit but rather a bunch of incremental changes.  And this is the 
first of them I've been able to isolate.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to