This is an automated email from the ASF dual-hosted git repository.
bgawrych pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git
The following commit(s) were added to refs/heads/master by this push:
new 024d01e Unify all names used to refer to oneDNN library in logs and
docs to oneDNN (#20719)
024d01e is described below
commit 024d01e0d7f4892ad7135faf9f39ac5d20247792
Author: bartekkuncer <[email protected]>
AuthorDate: Mon Nov 22 07:41:25 2021 +0100
Unify all names used to refer to oneDNN library in logs and docs to oneDNN
(#20719)
* Unify all names used to refer to oneDNN library in logs and docs to oneDNN
* Rewievs
* Update src/operator/nn/dnnl/dnnl_base-inl.h
Co-authored-by: Andrzej Kotłowski <[email protected]>
* Update src/operator/nn/dnnl/dnnl_fully_connected.cc
Co-authored-by: Andrzej Kotłowski <[email protected]>
* Update tests/nightly/test_np_large_array.py
Co-authored-by: Andrzej Kotłowski <[email protected]>
* Fix sanity
Co-authored-by: Andrzej Kotłowski <[email protected]>
---
CMakeLists.txt | 4 +--
benchmark/opperf/README.md | 2 +-
cd/README.md | 8 +++---
cd/utils/artifact_repository.md | 4 +--
cd/utils/artifact_repository.py | 2 +-
cd/utils/test_artifact_repository.py | 6 ++---
ci/dev_menu.py | 4 +--
ci/docker/runtime_functions.sh | 2 +-
ci/jenkins/Jenkins_steps.groovy | 30 +++++++++++-----------
config/darwin.cmake | 2 +-
config/distribution/darwin_cpu.cmake | 2 +-
config/distribution/darwin_cpu_mkl.cmake | 2 +-
config/distribution/darwin_native.cmake | 2 +-
config/distribution/linux_cpu.cmake | 2 +-
config/distribution/linux_cpu_mkl.cmake | 2 +-
config/distribution/linux_cu100.cmake | 2 +-
config/distribution/linux_cu101.cmake | 2 +-
config/distribution/linux_cu102.cmake | 2 +-
config/distribution/linux_cu110.cmake | 2 +-
config/distribution/linux_cu112.cmake | 2 +-
config/distribution/linux_cu92.cmake | 2 +-
config/distribution/linux_native.cmake | 2 +-
config/linux.cmake | 2 +-
config/linux_gpu.cmake | 2 +-
docs/python_docs/python/tutorials/index.rst | 2 +-
.../tutorials/performance/backend/profiler.md | 4 +--
.../src/_includes/get_started/cloud/cpu.md | 2 +-
.../src/_includes/get_started/cloud/gpu.md | 2 +-
.../cpp/docs/tutorials/multi_threaded_inference.md | 2 +-
docs/static_site/src/pages/api/faq/cloud.md | 4 +--
docs/static_site/src/pages/api/faq/env_var.md | 8 +++---
.../src/pages/api/faq/large_tensor_support.md | 4 +--
.../src/pages/api/faq/tensor_inspector_tutorial.md | 2 +-
example/README.md | 2 +-
example/quantization/README.md | 10 ++++----
example/quantization/imagenet_gen_qsym_onednn.py | 2 +-
include/mxnet/ndarray.h | 2 +-
src/c_api/c_api.cc | 6 ++---
src/ndarray/ndarray.cc | 16 ++++++------
src/operator/contrib/batch_norm_relu.cc | 4 +--
src/operator/nn/dnnl/dnnl_base-inl.h | 6 ++---
src/operator/nn/dnnl/dnnl_base.cc | 6 ++---
src/operator/nn/dnnl/dnnl_batch_norm-inl.h | 6 ++---
src/operator/nn/dnnl/dnnl_convolution.cc | 12 ++++-----
src/operator/nn/dnnl/dnnl_fully_connected.cc | 3 ++-
src/operator/nn/dnnl/dnnl_layer_norm.cc | 2 +-
src/operator/nn/dnnl/dnnl_pooling.cc | 10 ++++----
src/operator/nn/dnnl/dnnl_rnn.cc | 4 +--
src/operator/quantization/dnnl/dnnl_quantize-inl.h | 4 +--
.../quantization/dnnl/dnnl_quantize_v2-inl.h | 2 +-
.../quantization/dnnl/dnnl_requantize-inl.h | 2 +-
src/operator/quantization/quantized_batch_norm.cc | 2 +-
src/operator/quantization/quantized_conv.cc | 6 ++---
.../quantization/quantized_elemwise_add.cc | 4 +--
src/operator/quantization/quantized_pooling.cc | 6 ++---
.../subgraph/dnnl/dnnl_batch_dot_property.h | 2 +-
src/operator/subgraph/dnnl/dnnl_conv.cc | 2 +-
src/operator/subgraph/dnnl/dnnl_fc.cc | 2 +-
.../dnnl/dnnl_matmul_post_quantize_property.h | 2 +-
src/operator/tensor/cast_storage-inl.h | 4 +--
src/operator/tensor/elemwise_unary_op.h | 4 +--
tests/cpp/include/test_dnnl.h | 20 +++++++--------
tests/cpp/operator/dnnl_test.cc | 2 +-
tests/nightly/test_np_large_array.py | 2 +-
tests/python/dnnl/subgraphs/test_conv_subgraph.py | 6 ++---
tests/python/gpu/test_gluon_model_zoo_gpu.py | 4 +--
tests/python/quantization/test_quantization.py | 8 +++---
tests/python/unittest/test_numpy_gluon.py | 2 +-
tools/dependencies/README.md | 6 ++---
tools/pip/doc/CPU_ADDITIONAL.md | 2 +-
tools/pip/doc/CU101_ADDITIONAL.md | 2 +-
tools/pip/doc/CU102_ADDITIONAL.md | 2 +-
tools/pip/doc/CU110_ADDITIONAL.md | 2 +-
tools/pip/doc/CU112_ADDITIONAL.md | 2 +-
tools/pip/doc/NATIVE_ADDITIONAL.md | 2 +-
tools/staticbuild/README.md | 4 +--
76 files changed, 161 insertions(+), 160 deletions(-)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 19e1c49..196e007 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -62,9 +62,9 @@ option(USE_F16C "Build with x86 F16C instruction support" ON)
# autodetects supp
option(USE_LAPACK "Build with lapack support" ON)
option(USE_MKL_LAYERNORM "Use layer normalization from MKL, which is currently
slower than internal. No effect unless USE_BLAS=MKL (or mkl)." OFF)
if((NOT APPLE) AND (NOT MSVC) AND (CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL
"x86_64") AND (NOT CMAKE_CROSSCOMPILING))
- option(USE_ONEDNN "Build with ONEDNN support" ON)
+ option(USE_ONEDNN "Build with oneDNN support" ON)
else()
- option(USE_ONEDNN "Build with ONEDNN support" OFF)
+ option(USE_ONEDNN "Build with oneDNN support" OFF)
endif()
cmake_dependent_option(USE_INTGEMM "Build with x86_64 intgemm library for
low-precision multiplication" ON "CMAKE_SYSTEM_PROCESSOR STREQUAL x86_64" OFF)
if(NOT MSVC)
diff --git a/benchmark/opperf/README.md b/benchmark/opperf/README.md
index 2d641b6..1a66575 100644
--- a/benchmark/opperf/README.md
+++ b/benchmark/opperf/README.md
@@ -40,7 +40,7 @@ Benchmarks are usually done end-to-end for a given Network
Architecture. For exa
2. A standard Network Architecture like ResNet-50 is made up of many operators
Ex: Convolution2D, Softmax, Dense and more. Consider the following scenarios:
1. We improved the performance of Convolution2D operator, but due to a
bug, Softmax performance went down. Overall, we may observe end to end
benchmarks are running fine, we may miss out the performance degradation of a
single operator which can accumulate and become untraceable.
2. You need to see in a given network, which operator is taking maximum
time and plan optimization work. With end to end benchmarks, it is hard to get
more fine grained numbers at operator level.
-3. We need to know on different hardware infrastructure (Ex: CPU with ONEDNN,
GPU with NVIDIA CUDA and cuDNN) how different operators performs. With these
details, we can plan the optimization work at operator level, which could
exponentially boost up end to end performance.
+3. We need to know on different hardware infrastructure (Ex: CPU with oneDNN,
GPU with NVIDIA CUDA and cuDNN) how different operators performs. With these
details, we can plan the optimization work at operator level, which could
exponentially boost up end to end performance.
4. You want to have nightly performance tests across all operators in a deep
learning framework to catch regressions early.
5. We can integrate this framework with a CI/CD system to run per operator
performance tests for PRs. Example: When a PR modifies the kernel of
TransposeConv2D, we can run benchmarks of TransposeConv2D operator to verify
performance.
diff --git a/cd/README.md b/cd/README.md
index 083cb42..24ee1c0 100644
--- a/cd/README.md
+++ b/cd/README.md
@@ -22,18 +22,18 @@
## Introduction
-MXNet aims to support a variety of frontends, e.g. Python, Java, Perl, R, etc.
as well as environments (Windows, Linux, Mac, with or without GPU, with or
without ONEDNN support, etc.). This package contains a small continuous
delivery (CD) framework used to automate the delivery nightly and release
builds across our delivery channels.
+MXNet aims to support a variety of frontends, e.g. Python, Java, Perl, R, etc.
as well as environments (Windows, Linux, Mac, with or without GPU, with or
without oneDNN support, etc.). This package contains a small continuous
delivery (CD) framework used to automate the delivery nightly and release
builds across our delivery channels.
<!-- TODO: Add links to the actual jobs, once this is live on PROD -->
The CD process is driven by the [CD pipeline job](Jenkinsfile_cd_pipeline),
which orchestrates the order in which the artifacts are delivered. For
instance, first publish the libmxnet library before publishing the pip package.
It does this by triggering the [release job](Jenkinsfile_release_job) with a
specific set of parameters for each delivery channel. The release job executes
the specific release pipeline for a delivery channel across all MXNet
*variants*.
-A variant is a specific environment or features for which MXNet is compiled.
For instance CPU, GPU with CUDA v10.1, CUDA v10.2 with ONEDNN support, etc.
+A variant is a specific environment or features for which MXNet is compiled.
For instance CPU, GPU with CUDA v10.1, CUDA v10.2 with oneDNN support, etc.
-Currently, below variants are supported. All of these variants except native
have ONEDNN backend enabled.
+Currently, below variants are supported. All of these variants except native
have oneDNN backend enabled.
* *cpu*: CPU
-* *native*: CPU without ONEDNN
+* *native*: CPU without oneDNN
* *cu101*: CUDA 10.1
* *cu102*: CUDA 10.2
* *cu110*: CUDA 11.0
diff --git a/cd/utils/artifact_repository.md b/cd/utils/artifact_repository.md
index 3b673c8..e1c70cf 100644
--- a/cd/utils/artifact_repository.md
+++ b/cd/utils/artifact_repository.md
@@ -58,11 +58,11 @@ If not set, derived through the value of sys.platform
(https://docs.python.org/3
Manually configured through the --variant argument. The current variants are:
cpu, native, cu101, cu102, cu110, cu112.
-As long as the tool is being run from the MXNet code base, the runtime feature
detection tool
(https://github.com/larroy/mxnet/blob/dd432b7f241c9da2c96bcb877c2dc84e6a1f74d4/docs/api/python/libinfo/libinfo.md)
can be used to detect whether the library has been compiled with MKL (library
has ONEDNN feature enabled) and/or CUDA support (compiled with CUDA feature
enabled).
+As long as the tool is being run from the MXNet code base, the runtime feature
detection tool
(https://github.com/larroy/mxnet/blob/dd432b7f241c9da2c96bcb877c2dc84e6a1f74d4/docs/api/python/libinfo/libinfo.md)
can be used to detect whether the library has been compiled with oneDNN
(library has oneDNN feature enabled) and/or CUDA support (compiled with CUDA
feature enabled).
If it has been compiled with CUDA support, the output of
/usr/local/cuda/bin/nvcc --version can be mined for the exact CUDA version (eg.
8.0, 9.0, etc.).
-By knowing which features are enabled on the binary, and if necessary, which
CUDA version is installed on the machine, the value for the variant argument
can be calculated. Eg. if CUDA features are enabled, and nvcc reports cuda
version 10.2, then the variant would be cu102. If neither ONEDNN nor CUDA
features are enabled, the variant would be native.
+By knowing which features are enabled on the binary, and if necessary, which
CUDA version is installed on the machine, the value for the variant argument
can be calculated. Eg. if CUDA features are enabled, and nvcc reports cuda
version 10.2, then the variant would be cu102. If neither oneDNN nor CUDA
features are enabled, the variant would be native.
**Dependency Linking**
diff --git a/cd/utils/artifact_repository.py b/cd/utils/artifact_repository.py
index 6234ac9..d7c6528 100644
--- a/cd/utils/artifact_repository.py
+++ b/cd/utils/artifact_repository.py
@@ -313,7 +313,7 @@ def probe_gpu_variant(mxnet_features: Dict[str, bool]) ->
Optional[str]:
if cuda_version:
variant = 'cu{}'.format(cuda_version)
if not mxnet_features['ONEDNN']:
- RuntimeError('Error determining mxnet variant: ONEDNN should be
enabled for cuda variants')
+ RuntimeError('Error determining mxnet variant: oneDNN should be
enabled for cuda variants')
logger.debug('variant is: {}'.format(variant))
return variant
diff --git a/cd/utils/test_artifact_repository.py
b/cd/utils/test_artifact_repository.py
index a3f0444..b75e2fb 100644
--- a/cd/utils/test_artifact_repository.py
+++ b/cd/utils/test_artifact_repository.py
@@ -161,7 +161,7 @@ class TestArtifactRepositoryTool(unittest.TestCase):
@patch('artifact_repository.get_libmxnet_features')
def test_probe_variant_native(self, mock_features):
"""
- Tests 'native' is returned if ONEDNN and CUDA features are OFF
+ Tests 'native' is returned if oneDNN and CUDA features are OFF
"""
mock_features.return_value = {'ONEDNN': False, 'CUDA': False}
self.assertEqual(probe_mxnet_variant('libmxnet.so'), 'native')
@@ -169,7 +169,7 @@ class TestArtifactRepositoryTool(unittest.TestCase):
@patch('artifact_repository.get_libmxnet_features')
def test_probe_variant_cpu(self, mock_features):
"""
- Tests 'cpu' is returned if ONEDNN is ON and CUDA is OFF
+ Tests 'cpu' is returned if oneDNN is ON and CUDA is OFF
"""
mock_features.return_value = {'ONEDNN': True, 'CUDA': False}
self.assertEqual(probe_mxnet_variant('libmxnet.so'), 'cpu')
@@ -178,7 +178,7 @@ class TestArtifactRepositoryTool(unittest.TestCase):
@patch('artifact_repository.get_cuda_version')
def test_probe_variant_cuda(self, mock_cuda_version, mock_features):
"""
- Tests 'cu102' is returned if ONEDNN is OFF and CUDA is ON and CUDA
version is 10.2
+ Tests 'cu102' is returned if oneDNN is OFF and CUDA is ON and CUDA
version is 10.2
"""
mock_features.return_value = {'ONEDNN': True, 'CUDA': True}
mock_cuda_version.return_value = '102'
diff --git a/ci/dev_menu.py b/ci/dev_menu.py
index a21129c..c86eb0f 100644
--- a/ci/dev_menu.py
+++ b/ci/dev_menu.py
@@ -141,12 +141,12 @@ COMMANDS = OrderedDict([
"ci/build.py --nvidiadocker --platform ubuntu_gpu
/work/runtime_functions.sh build_ubuntu_gpu",
"ci/build.py --nvidiadocker --platform ubuntu_gpu
/work/runtime_functions.sh unittest_ubuntu_python3_gpu",
]),
- ('[Docker] Python3 GPU+ONEDNN unittests',
+ ('[Docker] Python3 GPU+oneDNN unittests',
[
"ci/build.py --nvidiadocker --platform ubuntu_gpu
/work/runtime_functions.sh build_ubuntu_gpu_onednn",
"ci/build.py --nvidiadocker --platform ubuntu_gpu
/work/runtime_functions.sh unittest_ubuntu_python3_gpu",
]),
- ('[Docker] Python3 CPU Intel ONEDNN unittests',
+ ('[Docker] Python3 CPU oneDNN unittests',
[
"ci/build.py --platform ubuntu_cpu /work/runtime_functions.sh
build_ubuntu_cpu_onednn",
"ci/build.py --platform ubuntu_cpu /work/runtime_functions.sh
unittest_ubuntu_python3_cpu",
diff --git a/ci/docker/runtime_functions.sh b/ci/docker/runtime_functions.sh
index 19824ff..06a28d1 100755
--- a/ci/docker/runtime_functions.sh
+++ b/ci/docker/runtime_functions.sh
@@ -1420,7 +1420,7 @@ build_static_libmxnet() {
# Tests CD PyPI packaging in CI
ci_package_pypi() {
set -ex
- # copies onednn header files to 3rdparty/onednn/include/oneapi/dnnl/ as in
CD
+ # copies oneDNN header files to 3rdparty/onednn/include/oneapi/dnnl/ as in
CD
mkdir -p 3rdparty/onednn/include/oneapi/dnnl
cp include/onednn/oneapi/dnnl/dnnl_version.h
3rdparty/onednn/include/oneapi/dnnl/.
cp include/onednn/oneapi/dnnl/dnnl_config.h
3rdparty/onednn/include/oneapi/dnnl/.
diff --git a/ci/jenkins/Jenkins_steps.groovy b/ci/jenkins/Jenkins_steps.groovy
index e6f4080..cfd5f61 100644
--- a/ci/jenkins/Jenkins_steps.groovy
+++ b/ci/jenkins/Jenkins_steps.groovy
@@ -174,7 +174,7 @@ def compile_unix_mkl_cpu(lib_name) {
}
def compile_unix_onednn_cpu(lib_name) {
- return ['CPU: ONEDNN': {
+ return ['CPU: oneDNN': {
node(NODE_LINUX_CPU) {
ws('workspace/build-onednn-cpu') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -188,7 +188,7 @@ def compile_unix_onednn_cpu(lib_name) {
}
def compile_unix_onednn_mkl_cpu(lib_name) {
- return ['CPU: ONEDNN_MKL': {
+ return ['CPU: oneDNN-MKL': {
node(NODE_LINUX_CPU) {
ws('workspace/build-onednn-cpu') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -202,7 +202,7 @@ def compile_unix_onednn_mkl_cpu(lib_name) {
}
def compile_unix_onednn_gpu(lib_name) {
- return ['GPU: ONEDNN': {
+ return ['GPU: oneDNN': {
node(NODE_LINUX_CPU) {
ws('workspace/build-onednn-gpu') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -216,7 +216,7 @@ def compile_unix_onednn_gpu(lib_name) {
}
def compile_unix_onednn_nocudnn_gpu(lib_name) {
- return ['GPU: ONEDNN_CUDNNOFF': {
+ return ['GPU: oneDNN-CUDNNOFF': {
node(NODE_LINUX_CPU) {
ws('workspace/build-onednn-gpu-nocudnn') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -286,7 +286,7 @@ def compile_centos7_cpu(lib_name) {
}
def compile_centos7_cpu_onednn() {
- return ['CPU: CentOS 7 ONEDNN': {
+ return ['CPU: CentOS 7 oneDNN': {
node(NODE_LINUX_CPU) {
ws('workspace/build-centos7-onednn') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -353,7 +353,7 @@ def compile_unix_clang_tidy_cpu() {
}
def compile_unix_clang_6_onednn_cpu() {
- return ['CPU: Clang 6 ONEDNN': {
+ return ['CPU: Clang 6 oneDNN': {
node(NODE_LINUX_CPU) {
ws('workspace/build-cpu-onednn-clang6') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -367,7 +367,7 @@ def compile_unix_clang_6_onednn_cpu() {
// TODO(leezu) delete once DUSE_DIST_KVSTORE=ON builds in -WError build
def compile_unix_clang_10_onednn_cpu() {
- return ['CPU: Clang 10 ONEDNN': {
+ return ['CPU: Clang 10 oneDNN': {
node(NODE_LINUX_CPU) {
ws('workspace/build-cpu-onednn-clang100') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -531,7 +531,7 @@ def compile_windows_cpu(lib_name) {
}
def compile_windows_cpu_onednn(lib_name) {
- return ['Build CPU ONEDNN windows':{
+ return ['Build CPU oneDNN windows':{
node(NODE_WINDOWS_CPU) {
ws('workspace/build-cpu-onednn') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -545,7 +545,7 @@ def compile_windows_cpu_onednn(lib_name) {
}
def compile_windows_cpu_onednn_mkl(lib_name) {
- return ['Build CPU ONEDNN MKL windows':{
+ return ['Build CPU oneDNN MKL windows':{
node(NODE_WINDOWS_CPU) {
ws('workspace/build-cpu-onednn-mkl') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -587,7 +587,7 @@ def compile_windows_gpu(lib_name) {
}
def compile_windows_gpu_onednn(lib_name) {
- return ['Build GPU ONEDNN windows':{
+ return ['Build GPU oneDNN windows':{
node(NODE_WINDOWS_CPU) {
ws('workspace/build-gpu') {
timeout(time: max_time, unit: 'MINUTES') {
@@ -765,7 +765,7 @@ def test_unix_python3_onnx_cpu(lib_name) {
}
def test_unix_python3_onednn_cpu(lib_name) {
- return ['Python3: ONEDNN-CPU': {
+ return ['Python3: oneDNN-CPU': {
node(NODE_LINUX_CPU) {
ws('workspace/ut-python3-onednn-cpu') {
try {
@@ -782,7 +782,7 @@ def test_unix_python3_onednn_cpu(lib_name) {
}
def test_unix_python3_onednn_mkl_cpu(lib_name) {
- return ['Python3: ONEDNN-MKL-CPU': {
+ return ['Python3: oneDNN-MKL-CPU': {
node(NODE_LINUX_CPU) {
ws('workspace/ut-python3-onednn-mkl-cpu') {
try {
@@ -799,7 +799,7 @@ def test_unix_python3_onednn_mkl_cpu(lib_name) {
}
def test_unix_python3_onednn_gpu(lib_name) {
- return ['Python3: ONEDNN-GPU': {
+ return ['Python3: oneDNN-GPU': {
node(NODE_LINUX_GPU_G4) {
ws('workspace/ut-python3-onednn-gpu') {
try {
@@ -815,7 +815,7 @@ def test_unix_python3_onednn_gpu(lib_name) {
}
def test_unix_python3_onednn_nocudnn_gpu(lib_name) {
- return ['Python3: ONEDNN-GPU-NOCUDNN': {
+ return ['Python3: oneDNN-GPU-NOCUDNN': {
node(NODE_LINUX_GPU_G4) {
ws('workspace/ut-python3-onednn-gpu-nocudnn') {
try {
@@ -1009,7 +1009,7 @@ def test_windows_python3_gpu(lib_name) {
}
def test_windows_python3_gpu_onednn(lib_name) {
- return ['Python 3: ONEDNN-GPU Win':{
+ return ['Python 3: oneDNN-GPU Win':{
node(NODE_WINDOWS_GPU) {
timeout(time: max_time, unit: 'MINUTES') {
ws('workspace/ut-python-gpu') {
diff --git a/config/darwin.cmake b/config/darwin.cmake
index 1015a2f..d64379c 100644
--- a/config/darwin.cmake
+++ b/config/darwin.cmake
@@ -45,7 +45,7 @@ set(OPENCV_ROOT "" CACHE BOOL "OpenCV install path. Supports
autodetection.")
set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
diff --git a/config/distribution/darwin_cpu.cmake
b/config/distribution/darwin_cpu.cmake
index ddda2ca..c7ce88a 100644
--- a/config/distribution/darwin_cpu.cmake
+++ b/config/distribution/darwin_cpu.cmake
@@ -24,7 +24,7 @@ set(USE_BLAS "apple" CACHE STRING "BLAS Vendor")
set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/darwin_cpu_mkl.cmake
b/config/distribution/darwin_cpu_mkl.cmake
index f4e54a8..b49e203 100644
--- a/config/distribution/darwin_cpu_mkl.cmake
+++ b/config/distribution/darwin_cpu_mkl.cmake
@@ -25,7 +25,7 @@ set(BLA_STATIC ON CACHE BOOL "Use static libraries")
set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/darwin_native.cmake
b/config/distribution/darwin_native.cmake
index 4b256c6..dd6815d 100644
--- a/config/distribution/darwin_native.cmake
+++ b/config/distribution/darwin_native.cmake
@@ -24,7 +24,7 @@ set(USE_BLAS "apple" CACHE STRING "BLAS Vendor")
set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN OFF CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN OFF CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cpu.cmake
b/config/distribution/linux_cpu.cmake
index 9b8a979..cb0576f 100644
--- a/config/distribution/linux_cpu.cmake
+++ b/config/distribution/linux_cpu.cmake
@@ -23,7 +23,7 @@ set(USE_BLAS "open" CACHE STRING "BLAS Vendor")
set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cpu_mkl.cmake
b/config/distribution/linux_cpu_mkl.cmake
index 3f8dcfc..afeb3bb 100644
--- a/config/distribution/linux_cpu_mkl.cmake
+++ b/config/distribution/linux_cpu_mkl.cmake
@@ -25,7 +25,7 @@ set(BLA_STATIC ON CACHE BOOL "Use static libraries")
set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu100.cmake
b/config/distribution/linux_cu100.cmake
index 35ec5a3..78bcfae 100644
--- a/config/distribution/linux_cu100.cmake
+++ b/config/distribution/linux_cu100.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu101.cmake
b/config/distribution/linux_cu101.cmake
index 80f522d..bbe3e9f 100644
--- a/config/distribution/linux_cu101.cmake
+++ b/config/distribution/linux_cu101.cmake
@@ -27,7 +27,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu102.cmake
b/config/distribution/linux_cu102.cmake
index d580354..a01662a 100644
--- a/config/distribution/linux_cu102.cmake
+++ b/config/distribution/linux_cu102.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu110.cmake
b/config/distribution/linux_cu110.cmake
index 0c239cb..1348da6 100644
--- a/config/distribution/linux_cu110.cmake
+++ b/config/distribution/linux_cu110.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu112.cmake
b/config/distribution/linux_cu112.cmake
index 031d129..87da1ad 100644
--- a/config/distribution/linux_cu112.cmake
+++ b/config/distribution/linux_cu112.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu92.cmake
b/config/distribution/linux_cu92.cmake
index 9466a52..a65a667 100644
--- a/config/distribution/linux_cu92.cmake
+++ b/config/distribution/linux_cu92.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_native.cmake
b/config/distribution/linux_native.cmake
index a0900f3..0ea1816 100644
--- a/config/distribution/linux_native.cmake
+++ b/config/distribution/linux_native.cmake
@@ -23,7 +23,7 @@ set(USE_BLAS "open" CACHE STRING "BLAS Vendor")
set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN OFF CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN OFF CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/linux.cmake b/config/linux.cmake
index 0a0f2d9..ec02d9d 100644
--- a/config/linux.cmake
+++ b/config/linux.cmake
@@ -62,7 +62,7 @@ set(OPENCV_ROOT "" CACHE BOOL "OpenCV install path. Supports
autodetection.")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
diff --git a/config/linux_gpu.cmake b/config/linux_gpu.cmake
index 42ebc11..53e096f 100644
--- a/config/linux_gpu.cmake
+++ b/config/linux_gpu.cmake
@@ -66,7 +66,7 @@ set(OPENCV_ROOT "" CACHE BOOL "OpenCV install path. Supports
autodetection.")
set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
diff --git a/docs/python_docs/python/tutorials/index.rst
b/docs/python_docs/python/tutorials/index.rst
index e9a61be..7a6bae3 100644
--- a/docs/python_docs/python/tutorials/index.rst
+++ b/docs/python_docs/python/tutorials/index.rst
@@ -85,7 +85,7 @@ Performance
.. card::
:title: oneDNN
- :link: performance/backend/mkldnn/index.html
+ :link: performance/backend/dnnl/index.html
How to get the most from your CPU by using oneDNN.
diff --git a/docs/python_docs/python/tutorials/performance/backend/profiler.md
b/docs/python_docs/python/tutorials/performance/backend/profiler.md
index a54892d..216722a 100644
--- a/docs/python_docs/python/tutorials/performance/backend/profiler.md
+++ b/docs/python_docs/python/tutorials/performance/backend/profiler.md
@@ -210,8 +210,8 @@ Let's zoom in to check the time taken by operators
The above picture visualizes the sequence in which the operators were executed
and the time taken by each operator.
-### Profiling ONEDNN Operators
-Reagrding ONEDNN operators, the library has already provided the internal
profiling tool. Firstly, you need set `DNNL_VERBOSE=1` to enable internal
profiler.
+### Profiling oneDNN Operators
+Reagrding oneDNN operators, the library has already provided the internal
profiling tool. Firstly, you need set `DNNL_VERBOSE=1` to enable internal
profiler.
`$ DNNL_VERBOSE=1 python my_script.py > dnnl_verbose.log`
diff --git a/docs/static_site/src/_includes/get_started/cloud/cpu.md
b/docs/static_site/src/_includes/get_started/cloud/cpu.md
index 810233f..6813f37 100644
--- a/docs/static_site/src/_includes/get_started/cloud/cpu.md
+++ b/docs/static_site/src/_includes/get_started/cloud/cpu.md
@@ -13,4 +13,4 @@ the [Download
page](https://mxnet.apache.org/get_started/download).
* **Amazon Web Services**
- [AWS Deep Learning AMI](https://aws.amazon.com/machine-learning/amis/) -
Preinstalled
Conda environments
-for Python 2 or 3 with MXNet and ONEDNN.
+for Python 2 or 3 with MXNet and oneDNN.
diff --git a/docs/static_site/src/_includes/get_started/cloud/gpu.md
b/docs/static_site/src/_includes/get_started/cloud/gpu.md
index 3a951ab..c21ba38 100644
--- a/docs/static_site/src/_includes/get_started/cloud/gpu.md
+++ b/docs/static_site/src/_includes/get_started/cloud/gpu.md
@@ -18,7 +18,7 @@
VM](https://docs.nvidia.com/ngc/ngc-alibaba-setup-guide/launching-nv-cloud-vm-co
MXNet models
- [AWS Deep Learning AMI](https://aws.amazon.com/machine-learning/amis/) -
Preinstalled
Conda environments
-for Python 2 or 3 with MXNet, CUDA, cuDNN, ONEDNN, and AWS Elastic Inference
+for Python 2 or 3 with MXNet, CUDA, cuDNN, oneDNN, and AWS Elastic Inference
- [Dynamic Training on
AWS](https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws) -
experimental manual EC2 setup or semi-automated CloudFormation setup
diff --git
a/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
b/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
index 086e440..89fbfae 100644
---
a/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
+++
b/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
@@ -163,7 +163,7 @@ The above code outputs results for different threads and
cleans up the thread sa
1. Only operators tested with the existing model coverage are supported. Other
operators and operator types (stateful operators, custom operators are not
supported. Existing model coverage is as follows (this list will keep growing
as we test more models with different model types):
-|Models Tested|ONEDNN|CUDNN|NO-CUDNN|
+|Models Tested|oneDNN|CUDNN|NO-CUDNN|
| --- | --- | --- | --- |
| imagenet1k-resnet-18 | Yes | Yes | Yes |
| imagenet1k-resnet-152 | Yes | Yes | Yes |
diff --git a/docs/static_site/src/pages/api/faq/cloud.md
b/docs/static_site/src/pages/api/faq/cloud.md
index 0b7498e..9668f4b 100644
--- a/docs/static_site/src/pages/api/faq/cloud.md
+++ b/docs/static_site/src/pages/api/faq/cloud.md
@@ -54,8 +54,8 @@ on how to connect to a Jupyter notebook running on an EC2
instance.
### Set Up an EC2 GPU Instance from Scratch
[Deep Learning Base
AMIs](https://aws.amazon.com/marketplace/search/results?x=0&y=0&searchTerms=Deep+Learning+Base+AMI)
-provide a foundational image with NVIDIA CUDA, cuDNN, GPU drivers, Intel
-ONEDNN, Docker and Nvidia-Docker, etc. for deploying your own custom deep
+provide a foundational image with NVIDIA CUDA, cuDNN, GPU drivers, oneDNN,
+Docker and Nvidia-Docker, etc. for deploying your own custom deep
learning environment. You may follow the [MXNet Build From Source
instructions](https://mxnet.apache.org/get_started/build_from_source) easily on
the Deep Learning Base AMIs.
diff --git a/docs/static_site/src/pages/api/faq/env_var.md
b/docs/static_site/src/pages/api/faq/env_var.md
index 1ecd30f..dad481c 100644
--- a/docs/static_site/src/pages/api/faq/env_var.md
+++ b/docs/static_site/src/pages/api/faq/env_var.md
@@ -372,12 +372,12 @@ If ctypes is used, it must be
`mxnet._ctypes.ndarray.NDArrayBase`.
* MXNET_ONEDNN_ENABLED
- Values: 0, 1 ```(default=1)```
- - Flag to enable or disable ONEDNN accelerator. On by default.
- - Only applies to mxnet that has been compiled with ONEDNN (```pip install
mxnet``` or built from source with ```USE_ONEDNN=1```)
+ - Flag to enable or disable oneDNN accelerator. On by default.
+ - Only applies to mxnet that has been compiled with oneDNN (```pip install
mxnet``` or built from source with ```USE_ONEDNN=1```)
* MXNET_ONEDNN_CACHE_NUM
- Values: Int ```(default=-1)```
- - Flag to set num of elements that ONEDNN cache can hold. Default is -1
which means cache size is unbounded. Should only be set if your model has
variable input shapes, as cache size may grow unbounded. The number represents
the number of items in the cache and is proportional to the number of layers
that use ONEDNN and different input shape.
+ - Flag to set num of elements that oneDNN cache can hold. Default is -1
which means cache size is unbounded. Should only be set if your model has
variable input shapes, as cache size may grow unbounded. The number represents
the number of items in the cache and is proportional to the number of layers
that use oneDNN and different input shape.
* MXNET_ONEDNN_FORCE_FC_AB_FORMAT
- Values: 0, 1 ```(default=0)```
@@ -446,7 +446,7 @@ If ctypes is used, it must be
`mxnet._ctypes.ndarray.NDArrayBase`.
* MXNET_USE_ONEDNN_RNN
- Values: 0(false) or 1(true) ```(default=1)```
- - This variable controls whether to use the ONEDNN backend in fused RNN
operator for CPU context. There are two fusion implementations of RNN operator
in MXNet. The ONEDNN implementation has a better performance than the naive
one, but the latter is more stable in the backward operation currently.
+ - This variable controls whether to use the oneDNN backend in fused RNN
operator for CPU context. There are two fusion implementations of RNN operator
in MXNet. The oneDNN implementation has a better performance than the naive
one, but the latter is more stable in the backward operation currently.
* MXNET_FC_TRUE_FP16
- Values: 0(false) or 1(true) ```(default=0)```
diff --git a/docs/static_site/src/pages/api/faq/large_tensor_support.md
b/docs/static_site/src/pages/api/faq/large_tensor_support.md
index 247720f..c7c3f74 100644
--- a/docs/static_site/src/pages/api/faq/large_tensor_support.md
+++ b/docs/static_site/src/pages/api/faq/large_tensor_support.md
@@ -141,9 +141,9 @@ Backward pass is partially supported and not completely
tested, so it is conside
Not supported:
-* GPU and ONEDNN.
+* GPU and oneDNN.
* Windows, ARM or any operating system other than Ubuntu
-* Any permutation of MXNet wheel that contains ONEDNN.
+* Any permutation of MXNet wheel that contains oneDNN.
* Other language bindings like Scala, Java, R, and Julia.
diff --git a/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
b/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
index 1212524..3e6a74c 100644
--- a/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
+++ b/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
@@ -168,7 +168,7 @@ Notice: in `interactive_print()`, you could also do value
dumping with command "
### Test Coverage and Limitations
-This utility has been tested on Mac and Ubuntu with and without CUDNN and
ONEDNN. Supports for `Tensor`, `TBlob`, and `NDArray`, as well as for CPU and
GPU have been manually tested.
+This utility has been tested on Mac and Ubuntu with and without CUDNN and
oneDNN. Supports for `Tensor`, `TBlob`, and `NDArray`, as well as for CPU and
GPU have been manually tested.
Currently, this utility only supports non-empty tensors and tensors with known
shapes i.e. `tb_.ndim() > 0`. Also, this utility only supports dense `NDArray`
objects, i.e. when the type is `kDefaultStorage`.
diff --git a/example/README.md b/example/README.md
index bd985a2..4e9023a 100644
--- a/example/README.md
+++ b/example/README.md
@@ -109,7 +109,7 @@ If your tutorial depends on specific packages, simply add
them to this provision
* [Kaggle 2nd national data science bowl](kaggle-ndsb2) - a tutorial for
Kaggle Second Nation Data Science Bowl
* [Multi-task Learning](multi-task) - how to use MXNet for multi-task learning
* [Profiling](profiler) - generate profiling results in json files
-* [Quantization and Calibration Examples](quantization) - examples of
quantizing a FP32 model to INT8 and performing low-precision inference with
Intel ONEDNN on CPU or cuDNN on GPU
+* [Quantization and Calibration Examples](quantization) - examples of
quantizing a FP32 model to INT8 and performing low-precision inference with
oneDNN on CPU or cuDNN on GPU
* [Recommender Systems](recommenders) - examples of how to build various kinds
of recommender systems
* [Restricted Boltzmann Machine](restricted-boltzmann-machine) - an example of
the binary restricted Boltzmann machine learning MNIST
* [Single Shot MultiBox Detector](ssd) - SSD object recognition example
diff --git a/example/quantization/README.md b/example/quantization/README.md
index 3370ada..fa060b9 100644
--- a/example/quantization/README.md
+++ b/example/quantization/README.md
@@ -20,11 +20,11 @@
# Model Quantization with Calibration Examples
-This folder contains examples of quantizing a FP32 model with Intel® oneAPI
Deep Neural Network Library (oneDNN) to (U)INT8 model.
+This folder contains examples of quantizing a FP32 model with oneAPI Deep
Neural Network Library (oneDNN) to (U)INT8 model.
-<h2 id="1">Model Quantization with Intel® oneDNN</h2>
+<h2 id="1">Model Quantization with oneDNN</h2>
-Intel® oneDNN supports quantization with subgraph features on Intel® CPU
Platform and can bring performance improvements on the [Intel® Xeon® Scalable
Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html).
+oneDNN supports quantization with subgraph features on Intel® CPU Platform and
can bring performance improvements on the [Intel® Xeon® Scalable
Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html).
```
usage: python imagenet_gen_qsym_onednn.py [-h] [--model MODEL] [--epoch EPOCH]
@@ -38,7 +38,7 @@ usage: python imagenet_gen_qsym_onednn.py [-h] [--model
MODEL] [--epoch EPOCH]
[--quantized-dtype {auto,int8,uint8}]
[--quiet]
-Generate a calibrated quantized model from a FP32 model with Intel oneDNN
support
+Generate a calibrated quantized model from a FP32 model with oneDNN support
optional arguments:
-h, --help show this help message and exit
@@ -87,7 +87,7 @@ optional arguments:
--quiet suppress most of log
```
-A new benchmark script `launch_inference_onednn.sh` has been designed to
launch performance benchmark for FP32 or INT8 image-classification models with
Intel® oneDNN.
+A new benchmark script `launch_inference_onednn.sh` has been designed to
launch performance benchmark for FP32 or INT8 image-classification models with
oneDNN.
```
usage: bash ./launch_inference_onednn.sh -s symbol_file [-b batch_size] [-iter
iteraton] [-ins instance] [-c cores/instance] [-h]
diff --git a/example/quantization/imagenet_gen_qsym_onednn.py
b/example/quantization/imagenet_gen_qsym_onednn.py
index c8e6709..65454a3 100644
--- a/example/quantization/imagenet_gen_qsym_onednn.py
+++ b/example/quantization/imagenet_gen_qsym_onednn.py
@@ -100,7 +100,7 @@ def get_exclude_symbols(model_name, exclude_first_conv):
if __name__ == '__main__':
- parser = argparse.ArgumentParser(description='Generate a calibrated
quantized model from a FP32 model with Intel oneDNN support')
+ parser = argparse.ArgumentParser(description='Generate a calibrated
quantized model from a FP32 model with oneDNN support')
parser.add_argument('--model', type=str, default='resnet50_v1',
help='model to be quantized. If no-pretrained is set
then'
'model must be provided to `model` directory in
the same path'
diff --git a/include/mxnet/ndarray.h b/include/mxnet/ndarray.h
index 5e6af4d..0e7fee1 100644
--- a/include/mxnet/ndarray.h
+++ b/include/mxnet/ndarray.h
@@ -739,7 +739,7 @@ class NDArray {
*/
explicit NDArray(const dnnl::memory::desc& md);
/*
- * Test if the data is stored in one of special DNNL format.
+ * Test if the data is stored in one of special DNNL formats.
*/
bool IsDNNLData() const {
return ptr_->IsDNNL();
diff --git a/src/c_api/c_api.cc b/src/c_api/c_api.cc
index d69db4e..0bc54bf 100644
--- a/src/c_api/c_api.cc
+++ b/src/c_api/c_api.cc
@@ -163,7 +163,7 @@ void CustomFComputeDispatcher(const std::string op_name,
std::vector<size_t> in_verIDs, out_verIDs;
std::vector<const char*> in_dev_type, out_dev_type;
std::vector<int> in_dev_id, out_dev_id;
- std::vector<NDArray> conv_mkl; // converted NDArrays from DNNL format
+ std::vector<NDArray> conv_dnnl; // converted NDArrays from DNNL format
// Extra data for sparse inputs and outputs.
std::vector<int> in_stypes(inputs.size(), 0), out_stypes(outputs.size(), 0);
@@ -179,8 +179,8 @@ void CustomFComputeDispatcher(const std::string op_name,
// reorder data if in DNNL format
if (in_nd->IsDNNLData()) {
// convert from DNNL
- conv_mkl.push_back(in_nd->Reorder2Default());
- in_nd = &(conv_mkl.back());
+ conv_dnnl.push_back(in_nd->Reorder2Default());
+ in_nd = &(conv_dnnl.back());
}
#endif
// pull out parts to pass over to library
diff --git a/src/ndarray/ndarray.cc b/src/ndarray/ndarray.cc
index cdbb764..8c955bd 100644
--- a/src/ndarray/ndarray.cc
+++ b/src/ndarray/ndarray.cc
@@ -603,7 +603,7 @@ void NDArray::Chunk::SetMKLMem(const mxnet::TShape& shape,
int dtype) {
for (size_t i = 0; i < dims.size(); i++)
dims[i] = shape[i];
} else {
- LOG(FATAL) << "DNNL doesn't support " << shape.ndim() << " dimensions";
+ LOG(FATAL) << "oneDNN doesn't support " << shape.ndim() << " dimensions";
}
dnnl::memory::format_tag layout = dnnl::memory::format_tag::undef;
switch (dims.size()) {
@@ -626,7 +626,7 @@ void NDArray::Chunk::SetMKLMem(const mxnet::TShape& shape,
int dtype) {
layout = dnnl::memory::format_tag::abcdef;
break;
default:
- LOG(FATAL) << "Not implemented dimension (" << dims.size() << ") for
DNNL";
+ LOG(FATAL) << "Not implemented dimension (" << dims.size() << ") for
oneDNN";
}
dnnl::memory::desc data_md{dims, get_dnnl_type(dtype), layout};
if (shandle.dptr == nullptr) {
@@ -639,7 +639,7 @@ void NDArray::Chunk::SetMKLMem(const mxnet::TShape& shape,
int dtype) {
const dnnl::memory* NDArray::GetDNNLData(const dnnl::memory::desc& desc) const
{
if (desc.get_size() != shape().Size() * GetTypeSize(dtype_)) {
- LOG(FATAL) << "The size of NDArray doesn't match the requested DNNL memory
desc";
+ LOG(FATAL) << "The size of NDArray doesn't match the requested oneDNN
memory desc";
return nullptr;
}
const dnnl::memory* mem = GetDNNLData();
@@ -705,7 +705,7 @@ NDArray NDArray::Reorder2Default() const {
if (!ptr_->dnnl_mem_->IsDNNL())
return *this;
- // create new ndarray from dnnl layout
+ // create new ndarray from dnnl layout
dnnl::memory::desc from_desc = ptr_->dnnl_mem_->GetDesc();
mxnet::TShape tshape(from_desc.data.ndims, -1);
for (int i = 0; i < from_desc.data.ndims; i++)
@@ -863,7 +863,7 @@ void NDArray::CopyFrom(const dnnl::memory& mem) {
return;
CHECK(mem.get_desc().get_size() == shape().Size() * GetTypeSize(dtype_))
- << "The size of NDArray doesn't match the requested DNNL memory desc";
+ << "The size of NDArray doesn't match the requested oneDNN memory desc";
// If this array uses DNNL layout, we have to make sure it's not a view.
// Otherwise, we'll have to change the layout inside the array.
@@ -876,8 +876,8 @@ void NDArray::CopyFrom(const dnnl::memory& mem) {
dnnl::memory* NDArray::CreateDNNLData(const dnnl::memory::desc& desc) {
if (desc.get_size() != shape().Size() * GetTypeSize(dtype_)) {
- LOG(FATAL) << "The size of NDArray doesn't match the requested DNNL memory
desc. "
- << "DNNL memory requests for " << desc.get_size() << " bytes,
but got "
+ LOG(FATAL) << "The size of NDArray doesn't match the requested oneDNN
memory desc. "
+ << "oneDNN memory requests for " << desc.get_size() << " bytes,
but got "
<< shape().Size() * GetTypeSize(dtype_) << " bytes from
NDArray";
return nullptr;
}
@@ -937,7 +937,7 @@ void NDArray::SetTBlob() const {
auto stype = storage_type();
if (stype == kDefaultStorage) {
#if MXNET_USE_ONEDNN == 1
- CHECK(!IsDNNLData()) << "We can't generate TBlob for DNNL data. "
+ CHECK(!IsDNNLData()) << "We can't generate TBlob for oneDNN data. "
<< "Please use Reorder2Default() to generate a new
NDArray first";
#endif
dptr += byte_offset_;
diff --git a/src/operator/contrib/batch_norm_relu.cc
b/src/operator/contrib/batch_norm_relu.cc
index d223c65..e15bcbe 100644
--- a/src/operator/contrib/batch_norm_relu.cc
+++ b/src/operator/contrib/batch_norm_relu.cc
@@ -158,7 +158,7 @@ void BatchNormWithReLUComputeExCPU(const nnvm::NodeAttrs&
attrs,
});
return;
}
- LOG(FATAL) << "BatchNormWithReLU operator only supports DNNL Backend.";
+ LOG(FATAL) << "BatchNormWithReLU operator only supports oneDNN Backend.";
}
void BatchNormWithReLUGradComputeExCPU(const nnvm::NodeAttrs& attrs,
@@ -174,7 +174,7 @@ void BatchNormWithReLUGradComputeExCPU(const
nnvm::NodeAttrs& attrs,
DNNLBatchNormBackward<float>(attrs, ctx, inputs, req, outputs, fuse_relu);
return;
}
- LOG(FATAL) << "BatchNormWithReLU operator only supports DNNL Backend.";
+ LOG(FATAL) << "BatchNormWithReLU operator only supports oneDNN Backend.";
}
#endif
diff --git a/src/operator/nn/dnnl/dnnl_base-inl.h
b/src/operator/nn/dnnl/dnnl_base-inl.h
index 3ec2e32..7951569 100644
--- a/src/operator/nn/dnnl/dnnl_base-inl.h
+++ b/src/operator/nn/dnnl/dnnl_base-inl.h
@@ -225,7 +225,7 @@ static inline dnnl::memory::data_type get_dnnl_type(int
dtype) {
case mshadow::kUint8:
return dnnl::memory::data_type::u8;
default:
- LOG(FATAL) << "unknown type for DNNL :" << static_cast<int>(dtype);
+ LOG(FATAL) << "unknown type for oneDNN :" << static_cast<int>(dtype);
return dnnl::memory::data_type::undef;
}
}
@@ -258,7 +258,7 @@ static inline int get_mxnet_type(dnnl_data_type_t dtype) {
case dnnl::memory::data_type::u8:
return mshadow::kUint8;
default:
- LOG(FATAL) << "unknown DNNL type";
+ LOG(FATAL) << "unknown oneDNN data type";
return mshadow::kFloat32;
}
}
@@ -321,7 +321,7 @@ inline static dnnl::memory::desc GetWeightDesc(const
NDArray& arr,
} else {
const auto ndim = arr.shape().ndim();
CHECK((ndim == 3) || (ndim == 4) || (ndim == 5))
- << "DNNL weight currently supports 3d or 4d or 5d layout";
+ << "oneDNN weight currently supports 3d or 4d or 5d layout";
auto tz = dnnl::memory::dims{0};
int N = 0, C = 1, H = 2, W = 3;
int D = -1;
diff --git a/src/operator/nn/dnnl/dnnl_base.cc
b/src/operator/nn/dnnl/dnnl_base.cc
index adcd8f2..73e9225 100644
--- a/src/operator/nn/dnnl/dnnl_base.cc
+++ b/src/operator/nn/dnnl/dnnl_base.cc
@@ -76,8 +76,8 @@ dnnl::memory* TmpMemMgr::Alloc(const dnnl::memory::desc& md) {
// the space by itself. Thus, we just let it continue for estimating the
maximum
// required space size. It will be allocated at next call.
if (this->curr_mem && dmlc::GetEnv("MXNET_ONEDNN_DEBUG", false)) {
- LOG(WARNING) << "DNNL debug message: The rest of the temporary space is
not "
- << "adequate for allocating " << md.get_size() << " bytes.
Thus, DNNL "
+ LOG(WARNING) << "oneDNN debug message: The rest of the temporary space
is not "
+ << "adequate for allocating " << md.get_size() << " bytes.
Thus, oneDNN "
<< "allocate the space by itself.";
}
dnnl_mem_ptr ret(new dnnl::memory(md, CpuEngine::Get()->get_engine()));
@@ -330,7 +330,7 @@ dnnl_format_tag_t GetDefaultFormat(int num_dims) {
case 6:
return dnnl_abcdef;
default:
- LOG(FATAL) << "Not implemented dimension (" << num_dims << ") for DNNL";
+ LOG(FATAL) << "Not implemented dimension (" << num_dims << ") for
oneDNN";
return dnnl_format_tag_undef;
}
}
diff --git a/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
b/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
index f7dc97b..3902b2e 100644
--- a/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
+++ b/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
@@ -223,7 +223,7 @@ void DNNLBatchNormForward(const nnvm::NodeAttrs& attrs,
workspace = &outputs[3];
auto engine = CpuEngine::Get()->get_engine();
if (workspace == nullptr) {
- LOG(FATAL) << "DNNL BatchNorm: incorrect workspace input";
+ LOG(FATAL) << "oneDNN BatchNorm: incorrect workspace input";
}
auto ws = std::make_shared<dnnl::memory>(
fwd.GetPd().workspace_desc(), engine,
workspace->GetDNNLData()->get_data_handle());
@@ -257,7 +257,7 @@ void DNNLBatchNormForward(const nnvm::NodeAttrs& attrs,
}
}
} else { // no input gamma and beta
- LOG(FATAL) << "DNNL batch normalization: should not reach here ...";
+ LOG(FATAL) << "oneDNN batch normalization: should not reach here ...";
}
}
@@ -478,7 +478,7 @@ void DNNLBatchNormBackward(const nnvm::NodeAttrs& attrs,
}
}
} else {
- LOG(FATAL) << "DNNL batch normalization backward: should not reach here
...";
+ LOG(FATAL) << "oneDNN batch normalization backward: should not reach here
...";
}
}
} // namespace op
diff --git a/src/operator/nn/dnnl/dnnl_convolution.cc
b/src/operator/nn/dnnl/dnnl_convolution.cc
index 7910f65..314bc62 100644
--- a/src/operator/nn/dnnl/dnnl_convolution.cc
+++ b/src/operator/nn/dnnl/dnnl_convolution.cc
@@ -84,7 +84,7 @@ std::shared_ptr<dnnl::convolution_forward::primitive_desc>
GetConvFwdImpl(
padding[1] = param.conv_param.pad[1];
padding[2] = param.conv_param.pad[2];
} else {
- LOG(FATAL) << "Unexpected DNNL Conv kernel size " <<
param.conv_param.kernel.ndim()
+ LOG(FATAL) << "Unexpected oneDNN Conv kernel size " <<
param.conv_param.kernel.ndim()
<< ", supporting only 1 or 2 or 3.";
}
dnnl::primitive_attr attr;
@@ -168,7 +168,7 @@ std::shared_ptr<dnnl::convolution_forward::primitive_desc>
GetConvFwdImpl(
dilates[1] = param.conv_param.dilate[1] - 1;
dilates[2] = param.conv_param.dilate[2] - 1;
} else {
- LOG(FATAL) << "Unexpected DNNL Conv dilate size " <<
param.conv_param.dilate.ndim()
+ LOG(FATAL) << "Unexpected oneDNN Conv dilate size " <<
param.conv_param.dilate.ndim()
<< ", supporting only 1 or 2 or 3.";
}
if (bias_md_ptr == nullptr) {
@@ -235,7 +235,7 @@ static
std::shared_ptr<dnnl::convolution_backward_data::primitive_desc> GetConvB
padding[1] = param.pad[1];
padding[2] = param.pad[2];
} else {
- LOG(FATAL) << "Unexpected DNNL Conv kernel size " << param.kernel.ndim()
+ LOG(FATAL) << "Unexpected oneDNN Conv kernel size " << param.kernel.ndim()
<< ", supporting only 1 or 2 or 3.";
}
@@ -278,7 +278,7 @@ static
std::shared_ptr<dnnl::convolution_backward_data::primitive_desc> GetConvB
dilates[1] = param.dilate[1] - 1;
dilates[2] = param.dilate[2] - 1;
} else {
- LOG(FATAL) << "Unexpected DNNL Conv dilate size " << param.dilate.ndim()
+ LOG(FATAL) << "Unexpected oneDNN Conv dilate size " <<
param.dilate.ndim()
<< ", supporting only 1 or 2 or 3.";
}
dnnl::convolution_backward_data::desc
desc(dnnl::algorithm::convolution_direct,
@@ -331,7 +331,7 @@ static
std::shared_ptr<dnnl::convolution_backward_weights::primitive_desc> GetCo
padding[1] = param.pad[1];
padding[2] = param.pad[2];
} else {
- LOG(FATAL) << "Unexpected DNNL Conv kernel size " << param.kernel.ndim()
+ LOG(FATAL) << "Unexpected oneDNN Conv kernel size " << param.kernel.ndim()
<< ", supporting only 1 or 2 or 3.";
}
@@ -385,7 +385,7 @@ static
std::shared_ptr<dnnl::convolution_backward_weights::primitive_desc> GetCo
dilates[1] = param.dilate[1] - 1;
dilates[2] = param.dilate[2] - 1;
} else {
- LOG(FATAL) << "Unexpected DNNL Conv dilate size " << param.dilate.ndim()
+ LOG(FATAL) << "Unexpected oneDNN Conv dilate size " <<
param.dilate.ndim()
<< ", supporting only 1 or 2 or 3.";
}
if (bias == nullptr) {
diff --git a/src/operator/nn/dnnl/dnnl_fully_connected.cc
b/src/operator/nn/dnnl/dnnl_fully_connected.cc
index 7879497..eca90b7 100644
--- a/src/operator/nn/dnnl/dnnl_fully_connected.cc
+++ b/src/operator/nn/dnnl/dnnl_fully_connected.cc
@@ -65,7 +65,8 @@ dnnl::inner_product_forward::primitive_desc
GetFCFwdImpl(const DNNLFCFullParam&
return dnnl::inner_product_forward::primitive_desc(desc, attr, engine);
} catch (dnnl::error& e) {
if (e.status == dnnl_unimplemented && full_param.dnnl_param.quantized) {
- LOG(ERROR) << "AVX512-BW support or DNNL v0.18 is required for INT8
fully_connected.";
+ LOG(ERROR)
+ << "AVX512-BW support or oneDNN v0.18 or later is required for
INT8 fully_connected.";
} else {
LOG(ERROR) << e.message;
}
diff --git a/src/operator/nn/dnnl/dnnl_layer_norm.cc
b/src/operator/nn/dnnl/dnnl_layer_norm.cc
index 2e720d0..2c938db 100644
--- a/src/operator/nn/dnnl/dnnl_layer_norm.cc
+++ b/src/operator/nn/dnnl/dnnl_layer_norm.cc
@@ -112,7 +112,7 @@ inline dnnl::memory::desc GetMeanVarDesc(const
dnnl::memory::data_type& dtype,
}
inline dnnl::memory GetScaleShiftMem(const NDArray& gamma, const NDArray&
beta) {
- // OneDNN takes gamma and beta as one SCALE_SHIFT tensor when both scale and
shift are used. In
+ // oneDNN takes gamma and beta as one SCALE_SHIFT tensor when both scale and
shift are used. In
// mxnet scale is called gamma and shift is called beta.
constexpr size_t gammaAndBeta = 2;
CHECK_EQ(gamma.shape()[0], beta.shape()[0]);
diff --git a/src/operator/nn/dnnl/dnnl_pooling.cc
b/src/operator/nn/dnnl/dnnl_pooling.cc
index 252bf05..4452951 100644
--- a/src/operator/nn/dnnl/dnnl_pooling.cc
+++ b/src/operator/nn/dnnl/dnnl_pooling.cc
@@ -48,7 +48,7 @@ void DNNLPoolingFwd::Init(const mxnet::NDArray& input,
if (alg_kind != dnnl::algorithm::pooling_max && alg_kind !=
dnnl::algorithm::pooling_avg &&
alg_kind != dnnl::algorithm::pooling_avg_include_padding &&
alg_kind != dnnl::algorithm::pooling_avg_exclude_padding) {
- LOG(FATAL) << "DNNL Pooling: algorithm is not supported";
+ LOG(FATAL) << "oneDNN Pooling: algorithm is not supported";
}
dnnl::prop_kind prop = dnnl::prop_kind::forward_scoring;
@@ -56,7 +56,7 @@ void DNNLPoolingFwd::Init(const mxnet::NDArray& input,
prop = dnnl::prop_kind::forward_training;
}
if (is_train && prop == dnnl::prop_kind::forward_scoring) {
- LOG(INFO) << "DNNL Pooling: training with prop_kind is forward_scoring";
+ LOG(INFO) << "oneDNN Pooling: training with prop_kind is forward_scoring";
}
const auto fwd_desc =
@@ -87,7 +87,7 @@ void DNNLPoolingFwd::Execute(const NDArray& in_data,
auto engine = CpuEngine::Get()->get_engine();
if (workspace == nullptr) {
- LOG(FATAL) << "DNNL Pooling: incorrect workspace input";
+ LOG(FATAL) << "oneDNN Pooling: incorrect workspace input";
}
auto ws = std::make_shared<dnnl::memory>(
@@ -99,7 +99,7 @@ void DNNLPoolingFwd::Execute(const NDArray& in_data,
CommitOutput(out_data, output_mem_t_);
DNNLStream::Get()->Submit();
} else {
- LOG(FATAL) << "DNNL Pooling: forward primitive is nullptr";
+ LOG(FATAL) << "oneDNN Pooling: forward primitive is nullptr";
}
}
@@ -116,7 +116,7 @@ dnnl::algorithm GetDNNLPoolAlgo(const PoolingParam& param) {
}
break;
default:
- LOG(FATAL) << "DNNL Pooling: Unknown pooling method.";
+ LOG(FATAL) << "oneDNN Pooling: Unknown pooling method.";
return dnnl::algorithm::pooling_max;
}
}
diff --git a/src/operator/nn/dnnl/dnnl_rnn.cc b/src/operator/nn/dnnl/dnnl_rnn.cc
index 051de78..22b9e27 100644
--- a/src/operator/nn/dnnl/dnnl_rnn.cc
+++ b/src/operator/nn/dnnl/dnnl_rnn.cc
@@ -145,7 +145,7 @@ DNNLRnnFullParam DNNLRnnFullParamParser(const RNNParam&
rnn_param,
void DNNLRnnMemMgr::Init(dim_t size, const Context& ctx) {
workspace_ = NDArray(TShape({size}), ctx, false, mshadow::kUint8);
if (workspace_.data().dptr_ == nullptr)
- LOG(FATAL) << "DNNL RNN operator memory allocation error.";
+ LOG(FATAL) << "oneDNN RNN operator memory allocation error.";
curr_mem = static_cast<char*>(workspace_.data().dptr_);
mem_size = size;
curr_size = size;
@@ -1265,7 +1265,7 @@ void DNNLRnnOp::Backward(const OpContext& ctx,
}
// Fetch weights, src and dst from Forward layer
if (bwd_vec_.size() != fwd_trn_vec_.size())
- LOG(FATAL) << "DNNL RNN fusion error.";
+ LOG(FATAL) << "oneDNN RNN fusion error.";
for (size_t lyr = 0; lyr < bwd_vec_.size(); ++lyr) {
bwd_vec_.at(lyr).FetchDataWeightsMem(fwd_trn_vec_.at(lyr));
bwd_vec_.at(lyr).SetWeightsGradsMem();
diff --git a/src/operator/quantization/dnnl/dnnl_quantize-inl.h
b/src/operator/quantization/dnnl/dnnl_quantize-inl.h
index 7a53ab1..13f2e1e 100644
--- a/src/operator/quantization/dnnl/dnnl_quantize-inl.h
+++ b/src/operator/quantization/dnnl/dnnl_quantize-inl.h
@@ -58,7 +58,7 @@ static void DNNLQuantizeComputeKer(const
std::vector<NDArray>& inputs,
*outputs[1].data().dptr<float>() = -real_range;
*outputs[2].data().dptr<float>() = real_range;
} else {
- LOG(FATAL) << "dnnl quantize op only supports int8 and uint8 as output
type";
+ LOG(FATAL) << "oneDNN quantize op only supports int8 and uint8 as output
type";
}
float scale = quantized_range / real_range;
dnnl::primitive_attr attr;
@@ -101,7 +101,7 @@ static void DNNLQuantizeCompute(const nnvm::NodeAttrs&
attrs,
} else if (param.out_type == mshadow::kInt8) {
DNNLQuantizeComputeKer<float, int8_t>(inputs, outputs, param, req);
} else {
- LOG(FATAL) << "dnnl quantize op only supports int8 and uint8 as output
type";
+ LOG(FATAL) << "oneDNN quantize op only supports int8 and uint8 as output
type";
}
}
diff --git a/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
b/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
index 1acc8a5..6181132 100644
--- a/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
+++ b/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
@@ -128,7 +128,7 @@ void SgDNNLQuantizeOperator::Forward(const OpContext& ctx,
*outputs[1].data().dptr<float>() = -real_range;
*outputs[2].data().dptr<float>() = real_range;
} else {
- LOG(FATAL) << "dnnl quantize op only supports int8 and uint8 as output
type";
+ LOG(FATAL) << "oneDNN quantize op only supports int8 and uint8 as output
type";
}
if (!initalized_) {
diff --git a/src/operator/quantization/dnnl/dnnl_requantize-inl.h
b/src/operator/quantization/dnnl/dnnl_requantize-inl.h
index 5eea9dc..2dc61d6 100644
--- a/src/operator/quantization/dnnl/dnnl_requantize-inl.h
+++ b/src/operator/quantization/dnnl/dnnl_requantize-inl.h
@@ -142,7 +142,7 @@ static void DNNLRequantizeForward(const nnvm::NodeAttrs&
attrs,
} else if (out_type == mshadow::kInt8) {
DNNLRequantizeForwardKer<int8_t>(attrs, ctx, inputs, req, outputs,
real_range);
} else {
- LOG(FATAL) << "dnnl requantize op only supports int8 and uint8 as output
type";
+ LOG(FATAL) << "oneDNN requantize op only supports int8 and uint8 as output
type";
}
}
diff --git a/src/operator/quantization/quantized_batch_norm.cc
b/src/operator/quantization/quantized_batch_norm.cc
index 9b1fd2a..009d6be 100644
--- a/src/operator/quantization/quantized_batch_norm.cc
+++ b/src/operator/quantization/quantized_batch_norm.cc
@@ -70,7 +70,7 @@ bool QuantizedBatchNormType(const nnvm::NodeAttrs& attrs,
#if MXNET_USE_ONEDNN == 1
CHECK(in_type->at(0) == mshadow::kInt8 || in_type->at(0) == mshadow::kUint8)
- << "QuantizedBatchNorm with DNNL backend only supports int8/uint8 input,
while "
+ << "QuantizedBatchNorm with oneDNN backend only supports int8/uint8
input, while "
<< in_type->at(0) << " is given.";
#else
TYPE_ASSIGN_CHECK(*in_type, 0, mshadow::kInt8);
diff --git a/src/operator/quantization/quantized_conv.cc
b/src/operator/quantization/quantized_conv.cc
index cd93ceb..95fbd3b 100644
--- a/src/operator/quantization/quantized_conv.cc
+++ b/src/operator/quantization/quantized_conv.cc
@@ -41,7 +41,7 @@ bool QuantizedConvShape(const nnvm::NodeAttrs& attrs,
if (param.layout.has_value()) {
#if MXNET_USE_ONEDNN == 1
CHECK(param.layout.value() == mshadow::kNCHW || param.layout.value() ==
mshadow::kNCDHW)
- << "dnnl quantized_conv now supports NCHW or NCDHW for now";
+ << "oneDNN quantized_conv only supports NCHW and NCDHW for now";
#else
CHECK_EQ(param.layout.value(), mshadow::kNCHW) << "quantized_conv only
supports NCHW for now";
#endif
@@ -55,9 +55,9 @@ bool QuantizedConvShape(const nnvm::NodeAttrs& attrs,
#if MXNET_USE_ONEDNN == 1
CHECK(kernel_ndims == 2U || kernel_ndims == 3U)
- << "dnnl quantized_conv only supports 2d or 3d kernel for now";
+ << "oneDNN quantized_conv only supports 2d and 3d kernel for now";
CHECK(data_ndims == 4U || data_ndims == 5U)
- << "dnnl quantized_conv only supports 4d or 5d layout for now";
+ << "oneDNN quantized_conv only supports 4d and 5d layout for now";
#else
CHECK_EQ(kernel_ndims, 2U) << "quantized_conv only supports 2D convolution
for now";
CHECK(param.dilate.ndim() == 0U || param.dilate.Size() == 1U)
diff --git a/src/operator/quantization/quantized_elemwise_add.cc
b/src/operator/quantization/quantized_elemwise_add.cc
index b314e9e..262f6e8 100644
--- a/src/operator/quantization/quantized_elemwise_add.cc
+++ b/src/operator/quantization/quantized_elemwise_add.cc
@@ -84,8 +84,8 @@ void QuantizedElemwiseAddForward(const nnvm::NodeAttrs& attrs,
const std::vector<TBlob>& in_data,
const std::vector<OpReqType>& req,
const std::vector<TBlob>& out_data) {
- LOG(FATAL) << "Not supported for MXNet built without DNNL. "
- "Please install DNNL enabled MXNet.";
+ LOG(FATAL) << "Not supported for MXNet built without oneDNN. "
+ "Please install oneDNN enabled MXNet.";
}
NNVM_REGISTER_OP(_contrib_quantized_elemwise_add)
diff --git a/src/operator/quantization/quantized_pooling.cc
b/src/operator/quantization/quantized_pooling.cc
index 14ec43296..8736d03 100644
--- a/src/operator/quantization/quantized_pooling.cc
+++ b/src/operator/quantization/quantized_pooling.cc
@@ -44,12 +44,12 @@ bool QuantizedPoolingShape(const nnvm::NodeAttrs& attrs,
#if MXNET_USE_ONEDNN == 1
CHECK(data_ndims == 4U || data_ndims == 5U)
- << "DNNL QuantizedPoolingOp only supports 4D/5D layout yet, input should
be 4D in"
+ << "oneDNN QuantizedPoolingOp only supports 4D/5D layout for now, input
should be 4D in "
<< "(batch, channel, y, x) or 5D in (batch, channel, d, y, x)";
CHECK(layout == mshadow::kNCHW || layout == mshadow::kNCDHW)
- << "DNNL QuantizedPoolingOp only supports NCHW/NCDHW layout for now, saw
" << layout;
+ << "oneDNN QuantizedPoolingOp only supports NCHW/NCDHW layout for now,
saw " << layout;
CHECK(kernel_ndims == 2U || kernel_ndims == 3U)
- << "DNNL QuantizedPoolingOp only supports 2D/3D pooling for now, saw" <<
kernel_ndims;
+ << "oneDNN QuantizedPoolingOp only supports 2D/3D pooling for now, saw"
<< kernel_ndims;
#else
CHECK_EQ(data_ndims, 4U) << "quantized_pooling: Input data should be 4D in "
<< "(batch, channel, y, x)";
diff --git a/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
b/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
index d2f33aa..c4dee3e 100644
--- a/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
+++ b/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
@@ -50,7 +50,7 @@ class SgDNNLBatchDotSelector : public SubgraphSelector {
class SgDNNLBatchDotProperty : public SubgraphProperty {
public:
static SubgraphPropertyPtr Create() {
- static const std::string& name = "DNNL Batch Dot optimization pass";
+ static const std::string& name = "oneDNN Batch Dot optimization pass";
auto property =
std::make_shared<SgDNNLBatchDotProperty>();
property->SetAttr<std::string>("property_name", name);
property->SetAttr<bool>("inference_only", true);
diff --git a/src/operator/subgraph/dnnl/dnnl_conv.cc
b/src/operator/subgraph/dnnl/dnnl_conv.cc
index bc1f6fd..7bc1b24 100644
--- a/src/operator/subgraph/dnnl/dnnl_conv.cc
+++ b/src/operator/subgraph/dnnl/dnnl_conv.cc
@@ -321,7 +321,7 @@ void SgDNNLConvOperator::Forward(const OpContext& ctx,
if (dnnl_param.with_act &&
full_conv_param.act_param.alg ==
dnnl::algorithm::eltwise_bounded_relu) {
if (dnnl_param.with_sum) {
- LOG(ERROR) << "dnnl doesn't support conv + relu + sum fusion yet.";
+ LOG(ERROR) << "oneDNN doesn't support conv + relu + sum fusion yet.";
full_conv_param.act_param.alpha *= output_scale;
} else {
// For conv+relu6 without sum, we don't need post_ops as
output_scale can do the cut off.
diff --git a/src/operator/subgraph/dnnl/dnnl_fc.cc
b/src/operator/subgraph/dnnl/dnnl_fc.cc
index 44c1a35..51989ca 100644
--- a/src/operator/subgraph/dnnl/dnnl_fc.cc
+++ b/src/operator/subgraph/dnnl/dnnl_fc.cc
@@ -56,7 +56,7 @@ class SgDNNLFCOp {
const std::vector<NDArray>& inputs,
const std::vector<OpReqType>& req,
const std::vector<NDArray>& outputs) {
- LOG(FATAL) << "Not implemented: subgraph dnnl fully connected only
supports "
+ LOG(FATAL) << "Not implemented: subgraph oneDNN fully connected only
supports "
"inference computation.";
}
diff --git a/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
b/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
index 6fbd97f..6c384a1 100644
--- a/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
+++ b/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
@@ -136,7 +136,7 @@ class SgDNNLMatmulPostQuantizeProperty : public
SubgraphProperty {
}
static SubgraphPropertyPtr Create() {
- static const std::string& name = "DNNL Matmul post-quantization
optimization pass";
+ static const std::string& name = "oneDNN Matmul post-quantization
optimization pass";
auto property =
std::make_shared<SgDNNLMatmulPostQuantizeProperty>();
property->SetAttr<std::string>("property_name", name);
property->SetAttr<bool>("inference_only", true);
diff --git a/src/operator/tensor/cast_storage-inl.h
b/src/operator/tensor/cast_storage-inl.h
index 7c6f83a..ee32915 100644
--- a/src/operator/tensor/cast_storage-inl.h
+++ b/src/operator/tensor/cast_storage-inl.h
@@ -445,8 +445,8 @@ inline bool CastStorageInferStorageType(const
nnvm::NodeAttrs& attrs,
// dns -> dns
DispatchMode mode = DispatchMode::kFCompute;
#if MXNET_USE_ONEDNN == 1
- // If we use DNNL and the arrays are in CPU memory, the array may store
- // DNNL layout, we should convert its layout explicitly.
+ // If we use oneDNN and the arrays are in CPU memory, the array may store
+ // oneDNN layout, we should convert its layout explicitly.
if (dev_mask == kCPU)
mode = DispatchMode::kFComputeEx;
#endif
diff --git a/src/operator/tensor/elemwise_unary_op.h
b/src/operator/tensor/elemwise_unary_op.h
index f516a78..5d23c98 100644
--- a/src/operator/tensor/elemwise_unary_op.h
+++ b/src/operator/tensor/elemwise_unary_op.h
@@ -399,8 +399,8 @@ class UnaryOp : public OpBase {
});
} break;
case kWriteInplace:
-// cannot check if ptrs are the same for DNNL because we may have
-// created copies of input when reordering. WriteInPlace will still write to
original array
+// cannot check if ptrs are the same for oneDNN because we may have created
+// copies of input when reordering. WriteInPlace will still write to original
array
#if MXNET_USE_ONEDNN == 0
CHECK_EQ(inputs[0].dptr_, outputs[0].dptr_);
#endif
diff --git a/tests/cpp/include/test_dnnl.h b/tests/cpp/include/test_dnnl.h
index 359a0f2..7172b0b 100644
--- a/tests/cpp/include/test_dnnl.h
+++ b/tests/cpp/include/test_dnnl.h
@@ -400,17 +400,17 @@ inline std::vector<NDArrayAttrs> GetTestInputArrays(int
types = A
// Type 2, 3.
arr = NDArray(shape, Context());
if (shape.ndim() == md.data.ndims && IsSameShape(md, shape) && types &
ArrayTypes::DNNL) {
- desc_str = "DNNL NDArray";
+ desc_str = "oneDNN NDArray";
InitDNNLArray(&arr, md, rand, max);
in_arrs.emplace_back(arr, desc_str);
} else if (shape.ndim() == md.data.ndims && !IsSameShape(md, shape) &&
types & ArrayTypes::DNNLDiffShape) {
- desc_str = "DNNL NDArray with different shape";
+ desc_str = "oneDNN NDArray with different shape";
InitDNNLArray(&arr, md, rand, max);
in_arrs.emplace_back(arr, desc_str);
} else if (shape.ndim() != md.data.ndims && types &
ArrayTypes::DNNLDiffDim) {
std::stringstream ss;
- ss << "DNNL NDArray with different dim " << shape.ndim() << "/" <<
md.data.ndims;
+ ss << "oneDNN NDArray with different dim " << shape.ndim() << "/" <<
md.data.ndims;
desc_str = ss.str();
InitDNNLArray(&arr, md, rand, max);
in_arrs.emplace_back(arr, desc_str);
@@ -420,17 +420,17 @@ inline std::vector<NDArrayAttrs> GetTestInputArrays(int
types = A
arr = NDArray(shape, Context());
if (shape.ndim() == md.data.ndims && IsSameShape(md, shape) &&
types & ArrayTypes::DNNLReshaped) {
- desc_str = "Reshaped DNNL NDArray";
+ desc_str = "Reshaped oneDNN NDArray";
InitDNNLArray(&arr, md, rand, max);
in_arrs.emplace_back(arr.Slice(slice_amount, arr.shape()[0] -
slice_amount), desc_str);
} else if (shape.ndim() == md.data.ndims && !IsSameShape(md, shape) &&
types & ArrayTypes::DNNLReshapedDiffShape) {
- desc_str = "Reshaped DNNL NDArray with different shape";
+ desc_str = "Reshaped oneDNN NDArray with different shape";
InitDNNLArray(&arr, md, rand, max);
in_arrs.emplace_back(arr.Slice(slice_amount, arr.shape()[0] -
slice_amount), desc_str);
} else if (shape.ndim() != md.data.ndims && types &
ArrayTypes::DNNLReshapedDiffDim) {
std::stringstream ss;
- ss << "DNNL NDArray with different dim " << shape.ndim() << "/" <<
md.data.ndims;
+ ss << "oneDNN NDArray with different dim " << shape.ndim() << "/" <<
md.data.ndims;
desc_str = ss.str();
InitDNNLArray(&arr, md, rand, max);
in_arrs.emplace_back(arr.Slice(slice_amount, arr.shape()[0] -
slice_amount), desc_str);
@@ -532,10 +532,10 @@ inline std::vector<NDArrayAttrs>
GetTestOutputArrays(const mxnet::TShape& shp,
// Type 2, 3.
arr = NDArray(shape, Context());
- desc_str = "DNNL NDArray";
+ desc_str = "oneDNN NDArray";
if (shape.ndim() != md.data.ndims) {
std::stringstream ss;
- ss << "DNNL NDArray with different memory layout " << shape.ndim() <<
"/" << md.data.ndims;
+ ss << "oneDNN NDArray with different memory layout " << shape.ndim() <<
"/" << md.data.ndims;
desc_str = ss.str();
}
@@ -552,10 +552,10 @@ inline std::vector<NDArrayAttrs>
GetTestOutputArrays(const mxnet::TShape& shp,
NDArray arr = NDArray(s, Context());
arr = arr.AsArray(shape, arr.dtype());
InitDNNLArray(&arr, md, rand, max);
- desc_str = "Reused DNNL NDArray";
+ desc_str = "Reused oneDNN NDArray";
if (shape.ndim() != md.data.ndims) {
std::stringstream ss;
- ss << "Reused DNNL NDArray with different memory layout " <<
shape.ndim() << "/"
+ ss << "Reused oneDNN NDArray with different memory layout " <<
shape.ndim() << "/"
<< md.data.ndims;
desc_str = ss.str();
}
diff --git a/tests/cpp/operator/dnnl_test.cc b/tests/cpp/operator/dnnl_test.cc
index 84b1a5a..99ed3c0 100644
--- a/tests/cpp/operator/dnnl_test.cc
+++ b/tests/cpp/operator/dnnl_test.cc
@@ -164,7 +164,7 @@ TEST(DNNL_NDArray, GetDataReorder) {
printf("Init array (");
for (size_t i = 0; i < s.ndim(); i++)
printf("%ld, ", s[i]);
- printf(") with DNNL memory (");
+ printf(") with oneDNN memory (");
for (int i = 0; i < md.data.ndims; i++)
printf("%ld, ", md.data.dims[i]);
printf("), format: %d\n", static_cast<int>(GetDefaultFormat(md)));
diff --git a/tests/nightly/test_np_large_array.py
b/tests/nightly/test_np_large_array.py
index ba9369a..d415c8a 100644
--- a/tests/nightly/test_np_large_array.py
+++ b/tests/nightly/test_np_large_array.py
@@ -2066,7 +2066,7 @@ def test_rnn_dim_check():
@use_np
[email protected](reason='runs without DNNL, wtih is not default behavior')
[email protected](reason='runs without oneDNN, which is not default behavior')
def test_rnn_vanilla():
L_SEQ, BAT, L_INP, L_STA = 2**20, 4, 2**10, 2
def batch_check(x, modes, params):
diff --git a/tests/python/dnnl/subgraphs/test_conv_subgraph.py
b/tests/python/dnnl/subgraphs/test_conv_subgraph.py
index 0b0840c..6b6169b 100644
--- a/tests/python/dnnl/subgraphs/test_conv_subgraph.py
+++ b/tests/python/dnnl/subgraphs/test_conv_subgraph.py
@@ -446,10 +446,10 @@ def test_deduplication(data_shape, reverse_sum_order,
model_name):
model_dedup.initialize()
model_no_dedup = copy.copy(model_dedup)
- model_dedup.optimize_for(data_nd, backend='DNNL', dedup_subgraph = True,
skip_infer = True)
+ model_dedup.optimize_for(data_nd, backend='ONEDNN', dedup_subgraph = True,
skip_infer = True)
out = model_dedup(data_nd)
- model_dedup.optimize_for(data_nd, backend='DNNL', dedup_subgraph = False,
skip_infer = True)
+ model_dedup.optimize_for(data_nd, backend='ONEDNN', dedup_subgraph = False,
skip_infer = True)
out_dedup = model_no_dedup(data_nd)
assert_almost_equal(out.asnumpy(), out_dedup.asnumpy(), rtol=1e-3, atol=1e-1)
@@ -776,7 +776,7 @@ def test_bn_relu_fusion(axis):
out1 = net(dummy_data)
out1.wait_to_read()
- net.optimize_for(dummy_data, backend='DNNL')
+ net.optimize_for(dummy_data, backend='ONEDNN')
out2 = net(dummy_data)
assert_almost_equal(out1, out2)
diff --git a/tests/python/gpu/test_gluon_model_zoo_gpu.py
b/tests/python/gpu/test_gluon_model_zoo_gpu.py
index 18d42df..4e4d3c6 100644
--- a/tests/python/gpu/test_gluon_model_zoo_gpu.py
+++ b/tests/python/gpu/test_gluon_model_zoo_gpu.py
@@ -97,14 +97,14 @@ def get_nn_model(name):
else:
return get_model(name)
-# Seed 1521019752 produced a failure on the Py2 DNNL-GPU CI runner
+# Seed 1521019752 produced a failure on the Py2 oneDNN-GPU CI runner
# on 2/16/2018 that was not reproducible. Problem could be timing related or
# based on non-deterministic algo selection.
@mx.util.use_np
@pytest.mark.serial
def test_training():
# We use network models without dropout for testing.
- # TODO(zhengda) mobilenet can't pass this test even without DNNL.
+ # TODO(zhengda) mobilenet can't pass this test even without oneDNN.
all_models = ['resnet18_v1', 'densenet121']
batch_size = 10
diff --git a/tests/python/quantization/test_quantization.py
b/tests/python/quantization/test_quantization.py
index 2360347..8f03c84 100644
--- a/tests/python/quantization/test_quantization.py
+++ b/tests/python/quantization/test_quantization.py
@@ -218,7 +218,7 @@ def test_quantized_conv():
return
elif is_test_for_dnnl():
# (TODO)Xinyu:
https://github.com/apache/incubator-mxnet/issues/16830
- print('skipped testing quantized_conv for dnnl cpu since it is a
flaky case')
+ print('skipped testing quantized_conv for oneDNN cpu since it is a
flaky case')
return
elif qdtype == 'uint8' and is_test_for_gpu():
print('skipped testing quantized_conv for gpu uint8 since it is
not supported yet')
@@ -823,7 +823,7 @@ def test_quantized_act():
print('skipped testing quantized_act for native cpu since it is
not supported yet')
return
elif qdtype == 'int8' and is_test_for_dnnl():
- print('skipped testing quantized_act for dnnl cpu int8 since it is
not supported yet')
+ print('skipped testing quantized_act for oneDNN cpu int8 since it
is not supported yet')
return
elif is_test_for_gpu():
print('skipped testing quantized_act for gpu since it is not
supported yet')
@@ -1058,7 +1058,7 @@ def test_quantize_model():
print('skipped testing quantize_model for native cpu since it is
not supported yet')
return True
elif qdtype == 'int8' and is_test_for_dnnl():
- print('skipped testing quantize_model for dnnl cpu int8 since it
is not supported yet')
+ print('skipped testing quantize_model for oneDNN cpu int8 since it
is not supported yet')
return True
elif qdtype == 'uint8' and is_test_for_gpu():
print('skipped testing quantize_model for gpu uint8 since it is
not supported yet')
@@ -1070,7 +1070,7 @@ def test_quantize_model():
print('skipped testing quantize_model for native cpu since it is
not supported yet')
return
elif qdtype == 'int8' and is_test_for_dnnl():
- print('skipped testing quantize_model for dnnl cpu int8 since it
is not supported yet')
+ print('skipped testing quantize_model for oneDNN cpu int8 since it
is not supported yet')
return
elif qdtype == 'uint8' and is_test_for_gpu():
print('skipped testing quantize_model for gpu uint8 since it is
not supported yet')
diff --git a/tests/python/unittest/test_numpy_gluon.py
b/tests/python/unittest/test_numpy_gluon.py
index 1241ead..0be4cad 100644
--- a/tests/python/unittest/test_numpy_gluon.py
+++ b/tests/python/unittest/test_numpy_gluon.py
@@ -434,7 +434,7 @@ def test_optimize_for():
out = net(a)
b = net.collect_params().pop('d.weight').data()
- net.optimize_for(a, b, backend="DNNL")
+ net.optimize_for(a, b, backend="ONEDNN")
out2 = net(a)
diff --git a/tools/dependencies/README.md b/tools/dependencies/README.md
index acc5d92..9ad6d78 100644
--- a/tools/dependencies/README.md
+++ b/tools/dependencies/README.md
@@ -52,12 +52,12 @@ MXNet is built on top of many dependencies. Managing these
dependencies could be
## Overview
-The dependencies could be categorized by several groups: BLAS libraries,
CPU-based performance boost library, i.e. ONEDNN and GPU-based performance
boosting library including CUDA, cuDNN, NCCL. and others including OpenCV,
Numpy, S3-related, PS-lite dependencies. The list below shows all the
dependencies and their version. Except for CUDA, cuDNN, NCCL which the user is
required to install on their environments, we statically link those
dependencies into libmxnet.so when we build PyPi pac [...]
+The dependencies could be categorized by several groups: BLAS libraries,
CPU-based performance boost library, i.e. oneDNN and GPU-based performance
boosting library including CUDA, cuDNN, NCCL. and others including OpenCV,
Numpy, S3-related, PS-lite dependencies. The list below shows all the
dependencies and their version. Except for CUDA, cuDNN, NCCL which the user is
required to install on their environments, we statically link those
dependencies into libmxnet.so when we build PyPi pac [...]
| Dependencies | MXNet Version |
| :------------: |:-------------:|
|OpenBLAS| 0.3.9 |
-|ONEDNN| 2.0 |
+|oneDNN| 2.3.2 |
|CUDA| 10.1 |
|cuDNN| 7.5.1 |
|NCCL| 2.4.2 |
@@ -105,7 +105,7 @@ sudo apt-get install -y git \
pkg-config
```
-### MKL, ONEDNN
+### MKL, oneDNN
@pengzhao-intel
(https://github.com/apache/incubator-mxnet/commits?author=pengzhao-intel) and
his team are tracking and updating these versions. Kudos to them!
diff --git a/tools/pip/doc/CPU_ADDITIONAL.md b/tools/pip/doc/CPU_ADDITIONAL.md
index 6cb82b8..7aa6a95 100644
--- a/tools/pip/doc/CPU_ADDITIONAL.md
+++ b/tools/pip/doc/CPU_ADDITIONAL.md
@@ -26,7 +26,7 @@ This package supports Linux, Mac OSX, and Windows platforms.
You may also want t
- [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2
support.
- [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1
support.
- [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without oneDNN.
To use this package on Linux you need the `libquadmath.so.0` shared library. On
Debian based systems, including Ubuntu, run `sudo apt install libquadmath0` to
diff --git a/tools/pip/doc/CU101_ADDITIONAL.md
b/tools/pip/doc/CU101_ADDITIONAL.md
index bcf0be7..3d92b11 100644
--- a/tools/pip/doc/CU101_ADDITIONAL.md
+++ b/tools/pip/doc/CU101_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may
also want to check:
- [mxnet-cu110](https://pypi.python.org/pypi/mxnet-cu110/) with CUDA-11.0
support.
- [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2
support.
- [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without oneDNN.
To download CUDA, check [CUDA
download](https://developer.nvidia.com/cuda-downloads). For more instructions,
check [CUDA Toolkit online
documentation](http://docs.nvidia.com/cuda/index.html).
diff --git a/tools/pip/doc/CU102_ADDITIONAL.md
b/tools/pip/doc/CU102_ADDITIONAL.md
index a227957..1f580bf 100644
--- a/tools/pip/doc/CU102_ADDITIONAL.md
+++ b/tools/pip/doc/CU102_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may
also want to check:
- [mxnet-cu110](https://pypi.python.org/pypi/mxnet-cu110/) with CUDA-11.0
support.
- [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1
support.
- [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without oneDNN.
To download CUDA, check [CUDA
download](https://developer.nvidia.com/cuda-downloads). For more instructions,
check [CUDA Toolkit online
documentation](http://docs.nvidia.com/cuda/index.html).
diff --git a/tools/pip/doc/CU110_ADDITIONAL.md
b/tools/pip/doc/CU110_ADDITIONAL.md
index f78a945..8774b76 100644
--- a/tools/pip/doc/CU110_ADDITIONAL.md
+++ b/tools/pip/doc/CU110_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may
also want to check:
- [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2
support.
- [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1
support.
- [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without oneDNN.
To download CUDA, check [CUDA
download](https://developer.nvidia.com/cuda-downloads). For more instructions,
check [CUDA Toolkit online
documentation](http://docs.nvidia.com/cuda/index.html).
diff --git a/tools/pip/doc/CU112_ADDITIONAL.md
b/tools/pip/doc/CU112_ADDITIONAL.md
index 37686ab..340ca13 100644
--- a/tools/pip/doc/CU112_ADDITIONAL.md
+++ b/tools/pip/doc/CU112_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may
also want to check:
- [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2
support.
- [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1
support.
- [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without oneDNN.
To download CUDA, check [CUDA
download](https://developer.nvidia.com/cuda-downloads). For more instructions,
check [CUDA Toolkit online
documentation](http://docs.nvidia.com/cuda/index.html).
diff --git a/tools/pip/doc/NATIVE_ADDITIONAL.md
b/tools/pip/doc/NATIVE_ADDITIONAL.md
index 36de931..4a303e8 100644
--- a/tools/pip/doc/NATIVE_ADDITIONAL.md
+++ b/tools/pip/doc/NATIVE_ADDITIONAL.md
@@ -26,7 +26,7 @@ This package supports Linux and Windows platforms. You may
also want to check:
- [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2
support.
- [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1
support.
- [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant
without oneDNN.
To download CUDA, check [CUDA
download](https://developer.nvidia.com/cuda-downloads). For more instructions,
check [CUDA Toolkit online
documentation](http://docs.nvidia.com/cuda/index.html).
diff --git a/tools/staticbuild/README.md b/tools/staticbuild/README.md
index 087fbf4..c7fb62b 100644
--- a/tools/staticbuild/README.md
+++ b/tools/staticbuild/README.md
@@ -33,13 +33,13 @@ Ubuntu systems.
```
tools/staticbuild/build.sh cu112
```
-This would build the mxnet package based on CUDA 11.2. Currently, we support
variants cpu, native, cu101, cu102, cu110, and cu112. All of these variants
expect native have ONEDNN backend enabled.
+This would build the mxnet package based on CUDA 11.2. Currently, we support
variants cpu, native, cu101, cu102, cu110, and cu112. All of these variants
expect native have oneDNN backend enabled.
```
tools/staticbuild/build.sh cpu
```
-This would build the mxnet package based on ONEDNN.
+This would build the mxnet package based on oneDNN.
As the result, users would have a complete static dependencies in
`/staticdeps` in the root folder as well as a static-linked `libmxnet.so` file
lives in `lib`. You can build your language binding by using the `libmxnet.so`.