szha opened a new issue #18244:
URL: https://github.com/apache/incubator-mxnet/issues/18244


   # Description
   Since #18146 we introduced parallel testing in CI with the hope of reducing 
test time. However, during that effort we noticed that the MKL and MKLDNN tests 
run slower than the setting without MKL or MKLDNN. This issue summarizes the 
set up and current time difference.
   
   ## Setup
   The results in this issue come from [this CI 
run](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18240/2/pipeline/),
 and the time difference is similar in master branch validation.
   
   To show the results, we compare the following test nodes.
   
   ### [Python 3: 
CPU](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18240/2/pipeline/363)
   
   #### 
[Build](https://github.com/apache/incubator-mxnet/blob/c329a524c520ec2b41aa4bff5ee113ff7221b790/ci/docker/runtime_functions.sh#L390-L405)
   ```
   build_ubuntu_cpu_openblas() {
       set -ex
       cd /work/build
       CXXFLAGS="-Wno-error=strict-overflow" CC=gcc-7 CXX=g++-7 cmake \
           -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
           -DENABLE_TESTCOVERAGE=ON \
           -DUSE_TVM_OP=ON \
           -DUSE_CPP_PACKAGE=ON \
           -DUSE_MKL_IF_AVAILABLE=OFF \
           -DUSE_MKLDNN=OFF \
           -DUSE_CUDA=OFF \
           -DUSE_DIST_KVSTORE=ON \
           -DBUILD_CYTHON_MODULES=ON \
           -G Ninja /work/mxnet
       ninja
   }
   ```
   
   ### [Python 3: 
MKL-CPU](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18240/2/pipeline/364)
   
   #### 
[Build](https://github.com/apache/incubator-mxnet/blob/c329a524c520ec2b41aa4bff5ee113ff7221b790/ci/docker/runtime_functions.sh#L425-L438)
   ```
   build_ubuntu_cpu_mkl() {
       set -ex
       cd /work/build
       CC=gcc-7 CXX=g++-7 cmake \
           -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
           -DENABLE_TESTCOVERAGE=ON \
           -DUSE_MKLDNN=OFF \
           -DUSE_CUDA=OFF \
           -DUSE_TVM_OP=ON \
           -DUSE_MKL_IF_AVAILABLE=ON \
           -DUSE_BLAS=MKL \
           -GNinja /work/mxnet
       ninja
   }
   ```
   
   ### [Python 3: 
MKLDNN-CPU](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18240/2/pipeline/365)
   
   #### 
[Build](https://github.com/apache/incubator-mxnet/blob/c329a524c520ec2b41aa4bff5ee113ff7221b790/ci/docker/runtime_functions.sh#L632-L645)
   ```
   build_ubuntu_cpu_mkldnn() {
       set -ex
       cd /work/build
       CC=gcc-7 CXX=g++-7 cmake \
           -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
           -DENABLE_TESTCOVERAGE=ON \
           -DUSE_MKL_IF_AVAILABLE=OFF \
           -DUSE_TVM_OP=ON \
           -DUSE_MKLDNN=ON \
           -DUSE_CUDA=OFF \
           -DUSE_CPP_PACKAGE=ON \
           -G Ninja /work/mxnet
       ninja
   }
   ```
   
   ### [Python 3: 
MKLDNN-MKL-CPU](http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-18240/2/pipeline/366)
   
   #### 
[Build](https://github.com/apache/incubator-mxnet/blob/c329a524c520ec2b41aa4bff5ee113ff7221b790/ci/docker/runtime_functions.sh#L647-L660)
   ```
   build_ubuntu_cpu_mkldnn_mkl() {
       set -ex
       cd /work/build
       CC=gcc-7 CXX=g++-7 cmake \
           -DCMAKE_BUILD_TYPE="RelWithDebInfo" \
           -DENABLE_TESTCOVERAGE=ON \
           -DUSE_MKLDNN=ON \
           -DUSE_CUDA=OFF \
           -DUSE_TVM_OP=ON \
           -DUSE_MKL_IF_AVAILABLE=ON \
           -DUSE_BLAS=MKL \
           -GNinja /work/mxnet
       ninja
   }
   ```
   
   ## Tests
   
   Each of the test node runs one of the two following test functions
   
   ### 
[python3_ut](https://github.com/apache/incubator-mxnet/blob/c329a524c520ec2b41aa4bff5ee113ff7221b790/ci/jenkins/Jenkins_steps.groovy#L47-L51)
   
   ```
   unittest_ubuntu_python3_cpu() {
       set -ex
       export PYTHONPATH=./python/
       export MXNET_MKLDNN_DEBUG=0  # Ignored if not present
       export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
       export MXNET_SUBGRAPH_VERBOSE=0
       export MXNET_ENABLE_CYTHON=0
       export DMLC_LOG_STACK_TRACE_DEPTH=10
       pytest -m 'not serial' -n 4 --durations=50 --cov-report 
xml:tests_unittest.xml --verbose tests/python/unittest
       pytest -m 'serial' --durations=50 --cov-report xml:tests_unittest.xml 
--cov-append --verbose tests/python/unittest
       pytest -n 4 --durations=50 --cov-report xml:tests_quantization.xml 
--verbose tests/python/quantization
   }
   ```
   
   ### 
[python_ut_mkldnn](https://github.com/apache/incubator-mxnet/blob/c329a524c520ec2b41aa4bff5ee113ff7221b790/ci/jenkins/Jenkins_steps.groovy#L53-L57)
   
   ```
   unittest_ubuntu_python3_cpu_mkldnn() {
       set -ex
       export PYTHONPATH=./python/
       export MXNET_MKLDNN_DEBUG=0  # Ignored if not present
       export MXNET_STORAGE_FALLBACK_LOG_VERBOSE=0
       export MXNET_SUBGRAPH_VERBOSE=0
       export MXNET_ENABLE_CYTHON=0
       export DMLC_LOG_STACK_TRACE_DEPTH=10
       pytest -m 'not serial' -n 4 --durations=50 --cov-report 
xml:tests_unittest.xml --verbose tests/python/unittest
       pytest -m 'serial' --durations=50 --cov-report xml:tests_unittest.xml 
--cov-append --verbose tests/python/unittest
       pytest -n 4 --durations=50 --cov-report xml:tests_mkl.xml --verbose 
tests/python/mkl
   }
   ```
   
   ### Test steps
   
   In order to show fine-grain time result, I break down each of the steps in 
the test nodes as the following:
   - docker launch: the time it takes to prepare and launch the docker
   - parallel unittest: `pytest -m 'not serial' -n 4 --durations=50 
--cov-report xml:tests_unittest.xml --verbose tests/python/unittest`
   - serial unittest: `pytest -m 'serial' --durations=50 --cov-report 
xml:tests_unittest.xml --cov-append --verbose tests/python/unittest`
   - quantization test: `pytest -n 4 --durations=50 --cov-report 
xml:tests_quantization.xml --verbose tests/python/quantization`
   - mkl test: `pytest -n 4 --durations=50 --cov-report xml:tests_mkl.xml 
--verbose tests/python/mkl`
   
   # Results
   
   The unit for the following results are seconds. `Python 3: CPU` results are 
considered baseline.
   
   | Test                     | docker launch | parallel unittest        | 
serial unittest    | quantization | mkl test |
   
|--------------------------|---------------|--------------------------|--------------------|--------------|----------|
   | Python 3: CPU            | 892           | 981                      | 951  
              | 39           | N/A      |
   | Python 3: MKL-CPU        | 874           | <b>4996 (+409%)</b>      | 
<b>4380 (+361%)</b>| 67           | N/A      |
   | Python 3: MKLDNN-CPU     | 892           | <b>5096 (+419%)</b>      | 
<b>3314 (+248%)</b>| N/A          | 1080     |
   | Python 3: MKLDNN-MKL-CPU | 901           | <b>3899 (+397%)</b>      | 
<b>3507 (+269%)</b>| N/A          | 1210     |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to