kpuatamazon opened a new issue #18699: URL: https://github.com/apache/incubator-mxnet/issues/18699
I'm experiencing a 4% slowdown in due to commit 83b51703e "Add simplified HybridBlock.forward without F (#17530)". | Commit | Time (s) | | -- | -- | | 08528c5cb9449e634f55cb72ceb2b1910d176f80 | 295.90 | | 56e79853ad5cf98baf84454eb595c7658bef6ee6 | 295.73 | | d4052fde4a94c8a70f805fe9b44980125afa8686 | 295.69 | | 9a355ebc1dfc5c087d41bb24c946e7f773e01af9 | 295.28 | | 3840786a25b16d0cfe6411e26f25aba8d3b574ff | 293.25 | | 83b51703ed354f41024423f140de38df2ba22d50 | 293.70 | | 8e3951876b3598c8b52606a467add5f239d88b38 | 281.37 | | b1338999c1289266487237c6926b79db8bf2fd6c | 282.58 | | 2f358fdc3129f02d8f83775bf1bed61d750f76b6 | 281.95 | | f01dc80f030d2d1912c8e134c95f373e9f1f8e7b | 283.60 | | 3667e9a2056e5ae6b94ba4f61675137012d96e82 | 283.79 | | f7c43234d08d4b3a9401f2d5ffc1a98795765ad5 | 282.60 | But it's slightly more complicated. At the beginning (f7c432), the build worked with MKLDNN at cb2cc7ac. Then 3667e9a2056e5ae6b94ba4f61675137012d96e82 broke the build with an MKLDNN upgrade, a bunch of commits went in with MKLDNN broken so they don't compile, and 08528c5cb9449e634f55cb72ceb2b1910d176f80 fixed it by downgrading MKL back to cb2cc7ac. Hence I wrote this script that downgrades MKLDNN to make stuff build and find the relevant commit: ```bash #!/bin/bash export LD_PRELOAD=/opt/intel/mkl/lib/intel64/libmkl_rt.so export CXXFLAGS="-O3 -march=native -DUSE_MKL -I/opt/intel/mkl/include -pipe" set -e -o pipefail . ~/test/bin/activate cd ~/mxnet git reset --hard git checkout --force $1 git clean -xdff git reset --hard git submodule foreach --recursive git clean -ffxd git submodule foreach --recursive git reset --hard git submodule update --init --recursive cd 3rdparty/mkldnn/ git checkout cb2cc7ac17ff4e2ef50805c7048d33256d82be4d cd ../.. rm -rf build mkdir build cd build cmake -GNinja -DUSE_CUDA=OFF -DCMAKE_BUILD_TYPE=Release .. ninja -j 4 cd ../python pip3 install -e . ~/benchmark.sh ``` Test conditions: - c5.2xlarge specifically a "Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz" - `OMP_NUM_THREADS=3` - Forced MKL backend with `export CXXFLAGS="-O3 -march=native -DUSE_MKL -I/opt/intel/mkl/include -pipe"` - Sockeye 45d704a4 - Batch size 1 More broadly, I'm trying to unpick performance differences seen in Sockeye as MXNet has changed since v1.5.x. This image shows commits since master diverged from v1.5.x. v1.5.x is on the left and cbbb864005c5dc5979d7fb7a849f35d0da9b55fd is on the right. ![master](https://user-images.githubusercontent.com/56725192/87342592-69661580-c543-11ea-85d1-36a4c8675ab1.png) The first big slowdown is an MKLDNN change on the left but that appears to have been fixed. Then there's a slowdown near the right that doesn't appear to be a single commit but rather a bunch of incremental changes. And this is the first of them I've been able to isolate. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org