[Author of the RFC: @TaoLv ]
## **Problem statement**
This RFC targets discussing the strategies for building MXNet with Intel DNNL, 
MKL, and different OpenMP runtimes for different platforms. It will help to 
address (or mitigate if not fully addressed) the issues [1][2][3][4][5][6] and 
pave the way towards the CMake build system for the project. After all of these 
are in place, we can expect a better build experience across different 
platforms and keeping the promising performance on Linux.

The content can be divided into the following parts:
- Clear the build flags for DNNL, MKL BLAS, OpenMP;
- The build logic and decisions for DNNL;
- The build logic and decisions for MKL BLAS;
- Other considerations, eg. performance and interoperability.

## **Proposed solutions**
**Build Flags**
We propose to keep/promote the flags below in the future CMake build system:

Flags | Options | Description
-- | -- | --
USE_MKLDNN   (or USE_DNNL) | **ON**, OFF | Whether to build MXNet with DNNL 
library to accelerate   some operators.
USE_BLAS | **openblas**, mkl,   apple, atlas | Choose BLAS library.
USE_OPENMP | **ON**, OFF | Whether to use OpenMP threading model.
MKL_USE_STATIC_LIBS | ON, **OFF** | Whether to link static libraries of MKL.
MKL_USE_ILP64 | ON, **OFF** | Turn it ON when INT64 tensor is enabled.

And deprecate the flags as follows:

Flags | Justifications
-- | --
USE_MKLML | MKLML library has been EOL and removed since MKL-DNN v1.0.
USE_MKL2017 | MKL2017 integration has been removed from MXNet since   MKL-DNN 
integrated.
USE_STATIC_MKL | Replaced by MKL_USE_STATIC_LIBS
USE_MKL_IF_AVAILABLE | Duplicate with USE_BLAS. Confusing when both of them are 
  set.
MKL_USE_SINGLE_DYNAMIC_LIBRARY | We don’t want users to link libmkl_rt.so which 
depends on   more explicit control during runtime. Removing it can make the 
code behavior   and library linkage more deterministic.


**Build with OpenMP**
As we all know, linking multiple OpenMP runtimes in a single application is 
error prone. To mitigate the long standing issue of OpenMP conflicts in MXNet, 
we suggest to adopt the same default linking behavior from DNNL library. That 
is to dynamically link the OpenMP runtime library which is provided by the 
compiler/system. It will help us on:
- Easing the build logic for choosing different OpenMP runtimes;
- Mitigating the interoperability of different OpenMP runtimes in different 
compiler ecosystems.

Users can decide whether to enable OpenMP threading by the USE_OPENMP flag. 
Once it’s set to OFF, the backend libraries like DNNL or MKL BLAS should also 
disable OpenMP threading and run in a sequential mode.
With that being said, there is no need to distribute the source code of LLVM 
OpenMP or build it from scratch. We can rely on compilers to pull in different 
OpenMP runtimes.
Please refer to the 
[OpenMP.cmake](https://github.com/intel/mkl-dnn/blob/master/cmake/OpenMP.cmake) 
module of DNNL for more implementation details.
(A more radical approach is to provide an option for users to choose a 
different OpenMP runtime. That can lead to better performance, eg. link Intel 
OpenMP explicitly, but is more risky.)

**Build with MKL-DNN (or DNNL)**
Intel MKL-DNN was renamed with DNNL in its v1.1 release. Since then, the MXNet 
community has been working on the transition to DNNL to leverage the latest 
features and optimizations from the library. That includes using the string 
“DNNL” or “dnnl” for future development and communication. We propose to 
promote the flag “USE_DNNL” since MXNet 2.0 and start deprecating “USE_MKLDNN” 
at the same time.
DNNL source code resides in the 3rdparty/mkldnn folder of the MXNet repository 
and is released and distributed along with MXNet source code. If one wants to 
build MXNet with DNNL to accelerate the execution on Intel CPU, she/he needs to 
enable -DUSE_DNNL=ON in CMake. However, this flag has been set to ON by default 
for all platforms except edge devices. On the contrary, to disable the DNNL 
acceleration, one needs to set -DUSE_DNNL=OFF explicitly in the CMake command 
line or the CMake configuration file.
As both MXNet and DNNL are under quick development with different release 
cadence, we decide to link the DNNL library into MXNet statically to avoid 
mis-linking in the user's environment. Given this, we need to set 
DNNL_LIBRARY_TYPE to STATIC when building DNNL.
Some additional flags to build DNNL:
- DNNl_CPU_RUNTIME: Need set it to SEQ explicitly when USE_OPENMP=OFF;
- DNNL_ARCH_OPT_FLAGS: Need pass compiler options to this build flag in string. 
Eg. -march or -mtune for GCC.
- MKLDNN_BUILD_TESTS and MKLDNN_BUILD_EXAMPLES: We set these two flags to OFF 
to speed up the compilation.

One thing that needs to be taken care of is that the header dnnl_config.h and 
dnnl_version.h will be generated dynamically during compilation and will be 
copied to the installation destination when calling make install. That means 
these two headers are not distributed with DNNL source code. For downstream 
projects which are including these headers need to find them in the 
installation path rather than the source code path.

**Build with MKL BLAS**
MXNet users can choose a BLAS library through the flag USE_BLAS which supports 
openblas, mkl, atlas, and apple for MacOS. To use Intel MKL BLAS, one can 
install it through apt or yum following the instructions from Intel: 
[Installing Intel® Performance Libraries and Intel® Distribution for Python* 
Using APT 
Repository](https://software.intel.com/en-us/articles/installing-intel-free-libs-and-python-apt-repo).
 MXNet also provides a tool script for Ubuntu, please refer to the 
ubuntu_mkl.sh under ci/docker/install.
For linking MKL BLAS to MXNet, we suggest following the advice from [Intel® 
Math Kernel Library Link Line 
Advisor](https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor).
 As we can see from the advisor tool, we need note that:
- Clang is not supported on Linux;
- LLVM OpenMP runtime is not supported (thus the following configuration: MXNet 
+ clang + MKL + linux should not be supported);
- When using GCC on Linux, it’s suggested to link the GNU counterpart libraries 
and OpenMP runtime, eg. libmkl_gnu_thread.a/so and libgomp.so;
- On Windows/MacOS, MKL BLAS can only be used with Intel OpenMP; (???)
- When OpenMP is disabled, we need to link libmkl_sequential.a/so instead of 
libmkl_gnu/intel_thread.a/so;
- For large tensors with one dimension size > INT32_MAX, to make MKL BLAS work, 
we need to link libmkl_intel_ilp64.a/so rather than libmkl_intel_lp64.a/so.

Given the constraints above, a typical CMake logic for MKL should look as 
follows. We still need to add more fine-grained checks for different platform 
and compiler combinations. In this proposal, we suggest linking MKL BLAS 
libraries dynamically. The are mainly two reasons to do so:
- Statically linking MKL libraries will considerably increase the size of MXNet 
binary.
- Statically linking MKL libraries needs recompilation in case of any changes 
in the MKL libraries.

Same as other third party dependencies, static linking will help us to better 
distribute MXNet to different systems and environments, without any unexpected 
functionality or performance issues. We provide the flag MKL_USE_STATIC_LIBS to 
enable static linking when it’s needed.
Anyway, disabling static linking is also supported by setting 
MKL_USE_STATIC_LIBS to OFF.

![image](https://user-images.githubusercontent.com/69239501/100760497-19a0c180-33f2-11eb-9cb3-6ece02e83605.png)

**Performance and Interoperability**
- Performance

Although DNNL has provided many performance primitives to accelerate those NN 
related operators in MXNet, we still depend on MKL to improve the performance 
of other linear algebra, random number generation, and vector operations 
through it’s BLAS, VML, VSL libraries. We strongly encourage users to build 
MXNet with MKL BLAS and the community to release convenient binaries with MKL 
enabled.
It's common sense that Intel OpenMP outperforms other OpenMP runtimes on Intel 
CPUs. We also hope users will link to Intel OpenMP for better performance. But 
because we are suggesting to link the OpenMP runtime provided by the compiler 
in the above section, now the problem is that we need to enable the Intel 
compiler build process which seems to be broken at this moment [6].
Given this, the recommended build on Linux so far should be: GCC + MKL BLAS + 
DNNL + GNU OpenMP.

- Interoperability of OpenMP

Though we can address the dual linkage of OpenMP in MXNet and hence remove the 
conflicts reported in [1], we still need to be aware of the risks in downstream 
projects. One possible scenario is described in [7]. When MXNet is linking with 
one OpenMP runtime and the user assembles it with another tool which is linking 
with another OpenMP runtime (or another version of the same OpenMP runtime), 
it’s still problematic. That’s why we’re suggesting to link OpenMP according to 
compiler and link it in dynamical way:
> - We assume that the OpenMP runtime has better interoperability in its own 
> compiler ecosystem. Eg. GNU OpenMP in the GCC community.
> - Dynamically linking makes it possible for users to create symbolic links to 
> workaround the conflict issue or pre-load another runtime.

## **References**
[1] [#17641](https://github.com/apache/incubator-mxnet/issues/17641)
[2] [#17366](https://github.com/apache/incubator-mxnet/issues/17366)
[3] [#10856](https://github.com/apache/incubator-mxnet/issues/10856)
[4] [#9205](https://github.com/apache/incubator-mxnet/issues/9205)
[5] [#11417](https://github.com/apache/incubator-mxnet/issues/11417)
[6] [#14086](https://github.com/apache/incubator-mxnet/issues/14086)
[7] [#8532](https://github.com/apache/incubator-mxnet/issues/8532)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-mxnet/issues/19610

Reply via email to