[incubator-mxnet] branch master updated: Unify all names used to refer to oneDNN library in logs and docs to oneDNN (#20719)

bgawrych Sun, 21 Nov 2021 22:43:52 -0800

This is an automated email from the ASF dual-hosted git repository.

bgawrych pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-mxnet.git



The following commit(s) were added to refs/heads/master by this push:
     new 024d01e  Unify all names used to refer to oneDNN library in logs and 
docs to oneDNN (#20719)
024d01e is described below

commit 024d01e0d7f4892ad7135faf9f39ac5d20247792
Author: bartekkuncer <[email protected]>
AuthorDate: Mon Nov 22 07:41:25 2021 +0100

    Unify all names used to refer to oneDNN library in logs and docs to oneDNN 
(#20719)
    
    * Unify all names used to refer to oneDNN library in logs and docs to oneDNN
    
    * Rewievs
    
    * Update src/operator/nn/dnnl/dnnl_base-inl.h
    
    Co-authored-by: Andrzej Kotłowski <[email protected]>
    
    * Update src/operator/nn/dnnl/dnnl_fully_connected.cc
    
    Co-authored-by: Andrzej Kotłowski <[email protected]>
    
    * Update tests/nightly/test_np_large_array.py
    
    Co-authored-by: Andrzej Kotłowski <[email protected]>
    
    * Fix sanity
    
    Co-authored-by: Andrzej Kotłowski <[email protected]>
---
 CMakeLists.txt                                     |  4 +--
 benchmark/opperf/README.md                         |  2 +-
 cd/README.md                                       |  8 +++---
 cd/utils/artifact_repository.md                    |  4 +--
 cd/utils/artifact_repository.py                    |  2 +-
 cd/utils/test_artifact_repository.py               |  6 ++---
 ci/dev_menu.py                                     |  4 +--
 ci/docker/runtime_functions.sh                     |  2 +-
 ci/jenkins/Jenkins_steps.groovy                    | 30 +++++++++++-----------
 config/darwin.cmake                                |  2 +-
 config/distribution/darwin_cpu.cmake               |  2 +-
 config/distribution/darwin_cpu_mkl.cmake           |  2 +-
 config/distribution/darwin_native.cmake            |  2 +-
 config/distribution/linux_cpu.cmake                |  2 +-
 config/distribution/linux_cpu_mkl.cmake            |  2 +-
 config/distribution/linux_cu100.cmake              |  2 +-
 config/distribution/linux_cu101.cmake              |  2 +-
 config/distribution/linux_cu102.cmake              |  2 +-
 config/distribution/linux_cu110.cmake              |  2 +-
 config/distribution/linux_cu112.cmake              |  2 +-
 config/distribution/linux_cu92.cmake               |  2 +-
 config/distribution/linux_native.cmake             |  2 +-
 config/linux.cmake                                 |  2 +-
 config/linux_gpu.cmake                             |  2 +-
 docs/python_docs/python/tutorials/index.rst        |  2 +-
 .../tutorials/performance/backend/profiler.md      |  4 +--
 .../src/_includes/get_started/cloud/cpu.md         |  2 +-
 .../src/_includes/get_started/cloud/gpu.md         |  2 +-
 .../cpp/docs/tutorials/multi_threaded_inference.md |  2 +-
 docs/static_site/src/pages/api/faq/cloud.md        |  4 +--
 docs/static_site/src/pages/api/faq/env_var.md      |  8 +++---
 .../src/pages/api/faq/large_tensor_support.md      |  4 +--
 .../src/pages/api/faq/tensor_inspector_tutorial.md |  2 +-
 example/README.md                                  |  2 +-
 example/quantization/README.md                     | 10 ++++----
 example/quantization/imagenet_gen_qsym_onednn.py   |  2 +-
 include/mxnet/ndarray.h                            |  2 +-
 src/c_api/c_api.cc                                 |  6 ++---
 src/ndarray/ndarray.cc                             | 16 ++++++------
 src/operator/contrib/batch_norm_relu.cc            |  4 +--
 src/operator/nn/dnnl/dnnl_base-inl.h               |  6 ++---
 src/operator/nn/dnnl/dnnl_base.cc                  |  6 ++---
 src/operator/nn/dnnl/dnnl_batch_norm-inl.h         |  6 ++---
 src/operator/nn/dnnl/dnnl_convolution.cc           | 12 ++++-----
 src/operator/nn/dnnl/dnnl_fully_connected.cc       |  3 ++-
 src/operator/nn/dnnl/dnnl_layer_norm.cc            |  2 +-
 src/operator/nn/dnnl/dnnl_pooling.cc               | 10 ++++----
 src/operator/nn/dnnl/dnnl_rnn.cc                   |  4 +--
 src/operator/quantization/dnnl/dnnl_quantize-inl.h |  4 +--
 .../quantization/dnnl/dnnl_quantize_v2-inl.h       |  2 +-
 .../quantization/dnnl/dnnl_requantize-inl.h        |  2 +-
 src/operator/quantization/quantized_batch_norm.cc  |  2 +-
 src/operator/quantization/quantized_conv.cc        |  6 ++---
 .../quantization/quantized_elemwise_add.cc         |  4 +--
 src/operator/quantization/quantized_pooling.cc     |  6 ++---
 .../subgraph/dnnl/dnnl_batch_dot_property.h        |  2 +-
 src/operator/subgraph/dnnl/dnnl_conv.cc            |  2 +-
 src/operator/subgraph/dnnl/dnnl_fc.cc              |  2 +-
 .../dnnl/dnnl_matmul_post_quantize_property.h      |  2 +-
 src/operator/tensor/cast_storage-inl.h             |  4 +--
 src/operator/tensor/elemwise_unary_op.h            |  4 +--
 tests/cpp/include/test_dnnl.h                      | 20 +++++++--------
 tests/cpp/operator/dnnl_test.cc                    |  2 +-
 tests/nightly/test_np_large_array.py               |  2 +-
 tests/python/dnnl/subgraphs/test_conv_subgraph.py  |  6 ++---
 tests/python/gpu/test_gluon_model_zoo_gpu.py       |  4 +--
 tests/python/quantization/test_quantization.py     |  8 +++---
 tests/python/unittest/test_numpy_gluon.py          |  2 +-
 tools/dependencies/README.md                       |  6 ++---
 tools/pip/doc/CPU_ADDITIONAL.md                    |  2 +-
 tools/pip/doc/CU101_ADDITIONAL.md                  |  2 +-
 tools/pip/doc/CU102_ADDITIONAL.md                  |  2 +-
 tools/pip/doc/CU110_ADDITIONAL.md                  |  2 +-
 tools/pip/doc/CU112_ADDITIONAL.md                  |  2 +-
 tools/pip/doc/NATIVE_ADDITIONAL.md                 |  2 +-
 tools/staticbuild/README.md                        |  4 +--
 76 files changed, 161 insertions(+), 160 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 19e1c49..196e007 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -62,9 +62,9 @@ option(USE_F16C "Build with x86 F16C instruction support" ON) 
# autodetects supp
 option(USE_LAPACK "Build with lapack support" ON)
 option(USE_MKL_LAYERNORM "Use layer normalization from MKL, which is currently 
slower than internal. No effect unless USE_BLAS=MKL (or mkl)." OFF)
 if((NOT APPLE) AND (NOT MSVC) AND (CMAKE_HOST_SYSTEM_PROCESSOR STREQUAL 
"x86_64") AND (NOT CMAKE_CROSSCOMPILING))
-  option(USE_ONEDNN "Build with ONEDNN support" ON)
+  option(USE_ONEDNN "Build with oneDNN support" ON)
 else()
-  option(USE_ONEDNN "Build with ONEDNN support" OFF)
+  option(USE_ONEDNN "Build with oneDNN support" OFF)
 endif()
 cmake_dependent_option(USE_INTGEMM "Build with x86_64 intgemm library for 
low-precision multiplication" ON "CMAKE_SYSTEM_PROCESSOR STREQUAL x86_64" OFF)
 if(NOT MSVC)
diff --git a/benchmark/opperf/README.md b/benchmark/opperf/README.md
index 2d641b6..1a66575 100644
--- a/benchmark/opperf/README.md
+++ b/benchmark/opperf/README.md
@@ -40,7 +40,7 @@ Benchmarks are usually done end-to-end for a given Network 
Architecture. For exa
 2. A standard Network Architecture like ResNet-50 is made up of many operators 
Ex: Convolution2D, Softmax, Dense and more. Consider the following scenarios:
     1. We improved the performance of Convolution2D operator, but due to a 
bug, Softmax performance went down. Overall, we may observe end to end 
benchmarks are running fine, we may miss out the performance degradation of a 
single operator which can accumulate and become untraceable.
     2. You need to see in a given network, which operator is taking maximum 
time and plan optimization work. With end to end benchmarks, it is hard to get 
more fine grained numbers at operator level.
-3. We need to know on different hardware infrastructure (Ex: CPU with ONEDNN, 
GPU with NVIDIA CUDA and cuDNN) how different operators performs. With these 
details, we can plan the optimization work at operator level, which could 
exponentially boost up end to end performance.
+3. We need to know on different hardware infrastructure (Ex: CPU with oneDNN, 
GPU with NVIDIA CUDA and cuDNN) how different operators performs. With these 
details, we can plan the optimization work at operator level, which could 
exponentially boost up end to end performance.
 4. You want to have nightly performance tests across all operators in a deep 
learning framework to catch regressions early. 
 5. We can integrate this framework with a CI/CD system to run per operator 
performance tests for PRs. Example: When a PR modifies the kernel of 
TransposeConv2D, we can run benchmarks of TransposeConv2D operator to verify 
performance.
 
diff --git a/cd/README.md b/cd/README.md
index 083cb42..24ee1c0 100644
--- a/cd/README.md
+++ b/cd/README.md
@@ -22,18 +22,18 @@
 
 ## Introduction
 
-MXNet aims to support a variety of frontends, e.g. Python, Java, Perl, R, etc. 
as well as environments (Windows, Linux, Mac, with or without GPU, with or 
without ONEDNN support, etc.). This package contains a small continuous 
delivery (CD) framework used to automate the delivery nightly and release 
builds across our delivery channels.
+MXNet aims to support a variety of frontends, e.g. Python, Java, Perl, R, etc. 
as well as environments (Windows, Linux, Mac, with or without GPU, with or 
without oneDNN support, etc.). This package contains a small continuous 
delivery (CD) framework used to automate the delivery nightly and release 
builds across our delivery channels.
 
 <!-- TODO: Add links to the actual jobs, once this is live on PROD -->
 
 The CD process is driven by the [CD pipeline job](Jenkinsfile_cd_pipeline), 
which orchestrates the order in which the artifacts are delivered. For 
instance, first publish the libmxnet library before publishing the pip package. 
It does this by triggering the [release job](Jenkinsfile_release_job) with a 
specific set of parameters for each delivery channel. The release job executes 
the specific release pipeline for a delivery channel across all MXNet 
*variants*.
 
-A variant is a specific environment or features for which MXNet is compiled. 
For instance CPU, GPU with CUDA v10.1, CUDA v10.2 with ONEDNN support, etc.
+A variant is a specific environment or features for which MXNet is compiled. 
For instance CPU, GPU with CUDA v10.1, CUDA v10.2 with oneDNN support, etc.
 
-Currently, below variants are supported. All of these variants except native 
have ONEDNN backend enabled.
+Currently, below variants are supported. All of these variants except native 
have oneDNN backend enabled.
 
 * *cpu*: CPU
-* *native*: CPU without ONEDNN
+* *native*: CPU without oneDNN
 * *cu101*: CUDA 10.1
 * *cu102*: CUDA 10.2
 * *cu110*: CUDA 11.0
diff --git a/cd/utils/artifact_repository.md b/cd/utils/artifact_repository.md
index 3b673c8..e1c70cf 100644
--- a/cd/utils/artifact_repository.md
+++ b/cd/utils/artifact_repository.md
@@ -58,11 +58,11 @@ If not set, derived through the value of sys.platform 
(https://docs.python.org/3
 
 Manually configured through the --variant argument. The current variants are: 
cpu, native, cu101, cu102, cu110, cu112.
 
-As long as the tool is being run from the MXNet code base, the runtime feature 
detection tool 
(https://github.com/larroy/mxnet/blob/dd432b7f241c9da2c96bcb877c2dc84e6a1f74d4/docs/api/python/libinfo/libinfo.md)
 can be used to detect whether the library has been compiled with MKL (library 
has ONEDNN feature enabled) and/or CUDA support (compiled with CUDA feature 
enabled).
+As long as the tool is being run from the MXNet code base, the runtime feature 
detection tool 
(https://github.com/larroy/mxnet/blob/dd432b7f241c9da2c96bcb877c2dc84e6a1f74d4/docs/api/python/libinfo/libinfo.md)
 can be used to detect whether the library has been compiled with oneDNN 
(library has oneDNN feature enabled) and/or CUDA support (compiled with CUDA 
feature enabled).
 
 If it has been compiled with CUDA support, the output of 
/usr/local/cuda/bin/nvcc --version can be mined for the exact CUDA version (eg. 
8.0, 9.0, etc.).
 
-By knowing which features are enabled on the binary, and if necessary, which 
CUDA version is installed on the machine, the value for the variant argument 
can be calculated. Eg. if CUDA features are enabled, and nvcc reports cuda 
version 10.2, then the variant would be cu102. If neither ONEDNN nor CUDA 
features are enabled, the variant would be native. 
+By knowing which features are enabled on the binary, and if necessary, which 
CUDA version is installed on the machine, the value for the variant argument 
can be calculated. Eg. if CUDA features are enabled, and nvcc reports cuda 
version 10.2, then the variant would be cu102. If neither oneDNN nor CUDA 
features are enabled, the variant would be native. 
 
 **Dependency Linking**
 
diff --git a/cd/utils/artifact_repository.py b/cd/utils/artifact_repository.py
index 6234ac9..d7c6528 100644
--- a/cd/utils/artifact_repository.py
+++ b/cd/utils/artifact_repository.py
@@ -313,7 +313,7 @@ def probe_gpu_variant(mxnet_features: Dict[str, bool]) -> 
Optional[str]:
     if cuda_version:
         variant = 'cu{}'.format(cuda_version)
         if not mxnet_features['ONEDNN']:
-            RuntimeError('Error determining mxnet variant: ONEDNN should be 
enabled for cuda variants')
+            RuntimeError('Error determining mxnet variant: oneDNN should be 
enabled for cuda variants')
         logger.debug('variant is: {}'.format(variant))
         return variant
 
diff --git a/cd/utils/test_artifact_repository.py 
b/cd/utils/test_artifact_repository.py
index a3f0444..b75e2fb 100644
--- a/cd/utils/test_artifact_repository.py
+++ b/cd/utils/test_artifact_repository.py
@@ -161,7 +161,7 @@ class TestArtifactRepositoryTool(unittest.TestCase):
     @patch('artifact_repository.get_libmxnet_features')
     def test_probe_variant_native(self, mock_features):
         """
-        Tests 'native' is returned if ONEDNN and CUDA features are OFF
+        Tests 'native' is returned if oneDNN and CUDA features are OFF
         """
         mock_features.return_value = {'ONEDNN': False, 'CUDA': False}
         self.assertEqual(probe_mxnet_variant('libmxnet.so'), 'native')
@@ -169,7 +169,7 @@ class TestArtifactRepositoryTool(unittest.TestCase):
     @patch('artifact_repository.get_libmxnet_features')
     def test_probe_variant_cpu(self, mock_features):
         """
-        Tests 'cpu' is returned if ONEDNN is ON and CUDA is OFF
+        Tests 'cpu' is returned if oneDNN is ON and CUDA is OFF
         """
         mock_features.return_value = {'ONEDNN': True, 'CUDA': False}
         self.assertEqual(probe_mxnet_variant('libmxnet.so'), 'cpu')
@@ -178,7 +178,7 @@ class TestArtifactRepositoryTool(unittest.TestCase):
     @patch('artifact_repository.get_cuda_version')
     def test_probe_variant_cuda(self, mock_cuda_version, mock_features):
         """
-        Tests 'cu102' is returned if ONEDNN is OFF and CUDA is ON and CUDA 
version is 10.2
+        Tests 'cu102' is returned if oneDNN is OFF and CUDA is ON and CUDA 
version is 10.2
         """
         mock_features.return_value = {'ONEDNN': True, 'CUDA': True}
         mock_cuda_version.return_value = '102'
diff --git a/ci/dev_menu.py b/ci/dev_menu.py
index a21129c..c86eb0f 100644
--- a/ci/dev_menu.py
+++ b/ci/dev_menu.py
@@ -141,12 +141,12 @@ COMMANDS = OrderedDict([
         "ci/build.py --nvidiadocker --platform ubuntu_gpu 
/work/runtime_functions.sh build_ubuntu_gpu",
         "ci/build.py --nvidiadocker --platform ubuntu_gpu 
/work/runtime_functions.sh unittest_ubuntu_python3_gpu",
     ]),
-    ('[Docker] Python3 GPU+ONEDNN unittests',
+    ('[Docker] Python3 GPU+oneDNN unittests',
     [
         "ci/build.py --nvidiadocker --platform ubuntu_gpu 
/work/runtime_functions.sh build_ubuntu_gpu_onednn",
         "ci/build.py --nvidiadocker --platform ubuntu_gpu 
/work/runtime_functions.sh unittest_ubuntu_python3_gpu",
     ]),
-    ('[Docker] Python3 CPU Intel ONEDNN unittests',
+    ('[Docker] Python3 CPU oneDNN unittests',
     [
         "ci/build.py --platform ubuntu_cpu /work/runtime_functions.sh 
build_ubuntu_cpu_onednn",
         "ci/build.py --platform ubuntu_cpu /work/runtime_functions.sh 
unittest_ubuntu_python3_cpu",
diff --git a/ci/docker/runtime_functions.sh b/ci/docker/runtime_functions.sh
index 19824ff..06a28d1 100755
--- a/ci/docker/runtime_functions.sh
+++ b/ci/docker/runtime_functions.sh
@@ -1420,7 +1420,7 @@ build_static_libmxnet() {
 # Tests CD PyPI packaging in CI
 ci_package_pypi() {
     set -ex
-    # copies onednn header files to 3rdparty/onednn/include/oneapi/dnnl/ as in 
CD
+    # copies oneDNN header files to 3rdparty/onednn/include/oneapi/dnnl/ as in 
CD
     mkdir -p 3rdparty/onednn/include/oneapi/dnnl
     cp include/onednn/oneapi/dnnl/dnnl_version.h 
3rdparty/onednn/include/oneapi/dnnl/.
     cp include/onednn/oneapi/dnnl/dnnl_config.h 
3rdparty/onednn/include/oneapi/dnnl/.
diff --git a/ci/jenkins/Jenkins_steps.groovy b/ci/jenkins/Jenkins_steps.groovy
index e6f4080..cfd5f61 100644
--- a/ci/jenkins/Jenkins_steps.groovy
+++ b/ci/jenkins/Jenkins_steps.groovy
@@ -174,7 +174,7 @@ def compile_unix_mkl_cpu(lib_name) {
 }
 
 def compile_unix_onednn_cpu(lib_name) {
-    return ['CPU: ONEDNN': {
+    return ['CPU: oneDNN': {
       node(NODE_LINUX_CPU) {
         ws('workspace/build-onednn-cpu') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -188,7 +188,7 @@ def compile_unix_onednn_cpu(lib_name) {
 }
 
 def compile_unix_onednn_mkl_cpu(lib_name) {
-    return ['CPU: ONEDNN_MKL': {
+    return ['CPU: oneDNN-MKL': {
       node(NODE_LINUX_CPU) {
         ws('workspace/build-onednn-cpu') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -202,7 +202,7 @@ def compile_unix_onednn_mkl_cpu(lib_name) {
 }
 
 def compile_unix_onednn_gpu(lib_name) {
-    return ['GPU: ONEDNN': {
+    return ['GPU: oneDNN': {
       node(NODE_LINUX_CPU) {
         ws('workspace/build-onednn-gpu') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -216,7 +216,7 @@ def compile_unix_onednn_gpu(lib_name) {
 }
 
 def compile_unix_onednn_nocudnn_gpu(lib_name) {
-    return ['GPU: ONEDNN_CUDNNOFF': {
+    return ['GPU: oneDNN-CUDNNOFF': {
        node(NODE_LINUX_CPU) {
          ws('workspace/build-onednn-gpu-nocudnn') {
            timeout(time: max_time, unit: 'MINUTES') {
@@ -286,7 +286,7 @@ def compile_centos7_cpu(lib_name) {
 }
 
 def compile_centos7_cpu_onednn() {
-    return ['CPU: CentOS 7 ONEDNN': {
+    return ['CPU: CentOS 7 oneDNN': {
       node(NODE_LINUX_CPU) {
         ws('workspace/build-centos7-onednn') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -353,7 +353,7 @@ def compile_unix_clang_tidy_cpu() {
 }
 
 def compile_unix_clang_6_onednn_cpu() {
-    return ['CPU: Clang 6 ONEDNN': {
+    return ['CPU: Clang 6 oneDNN': {
       node(NODE_LINUX_CPU) {
         ws('workspace/build-cpu-onednn-clang6') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -367,7 +367,7 @@ def compile_unix_clang_6_onednn_cpu() {
 
 // TODO(leezu) delete once DUSE_DIST_KVSTORE=ON builds in -WError build
 def compile_unix_clang_10_onednn_cpu() {
-    return ['CPU: Clang 10 ONEDNN': {
+    return ['CPU: Clang 10 oneDNN': {
       node(NODE_LINUX_CPU) {
         ws('workspace/build-cpu-onednn-clang100') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -531,7 +531,7 @@ def compile_windows_cpu(lib_name) {
 }
 
 def compile_windows_cpu_onednn(lib_name) {
-    return ['Build CPU ONEDNN windows':{
+    return ['Build CPU oneDNN windows':{
       node(NODE_WINDOWS_CPU) {
         ws('workspace/build-cpu-onednn') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -545,7 +545,7 @@ def compile_windows_cpu_onednn(lib_name) {
 }
 
 def compile_windows_cpu_onednn_mkl(lib_name) {
-    return ['Build CPU ONEDNN MKL windows':{
+    return ['Build CPU oneDNN MKL windows':{
       node(NODE_WINDOWS_CPU) {
         ws('workspace/build-cpu-onednn-mkl') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -587,7 +587,7 @@ def compile_windows_gpu(lib_name) {
 }
 
 def compile_windows_gpu_onednn(lib_name) {
-    return ['Build GPU ONEDNN windows':{
+    return ['Build GPU oneDNN windows':{
       node(NODE_WINDOWS_CPU) {
         ws('workspace/build-gpu') {
           timeout(time: max_time, unit: 'MINUTES') {
@@ -765,7 +765,7 @@ def test_unix_python3_onnx_cpu(lib_name) {
 }
 
 def test_unix_python3_onednn_cpu(lib_name) {
-    return ['Python3: ONEDNN-CPU': {
+    return ['Python3: oneDNN-CPU': {
       node(NODE_LINUX_CPU) {
         ws('workspace/ut-python3-onednn-cpu') {
           try {
@@ -782,7 +782,7 @@ def test_unix_python3_onednn_cpu(lib_name) {
 }
 
 def test_unix_python3_onednn_mkl_cpu(lib_name) {
-    return ['Python3: ONEDNN-MKL-CPU': {
+    return ['Python3: oneDNN-MKL-CPU': {
       node(NODE_LINUX_CPU) {
         ws('workspace/ut-python3-onednn-mkl-cpu') {
           try {
@@ -799,7 +799,7 @@ def test_unix_python3_onednn_mkl_cpu(lib_name) {
 }
 
 def test_unix_python3_onednn_gpu(lib_name) {
-    return ['Python3: ONEDNN-GPU': {
+    return ['Python3: oneDNN-GPU': {
       node(NODE_LINUX_GPU_G4) {
         ws('workspace/ut-python3-onednn-gpu') {
           try {
@@ -815,7 +815,7 @@ def test_unix_python3_onednn_gpu(lib_name) {
 }
 
 def test_unix_python3_onednn_nocudnn_gpu(lib_name) {
-    return ['Python3: ONEDNN-GPU-NOCUDNN': {
+    return ['Python3: oneDNN-GPU-NOCUDNN': {
       node(NODE_LINUX_GPU_G4) {
         ws('workspace/ut-python3-onednn-gpu-nocudnn') {
           try {
@@ -1009,7 +1009,7 @@ def test_windows_python3_gpu(lib_name) {
 }
 
 def test_windows_python3_gpu_onednn(lib_name) {
-    return ['Python 3: ONEDNN-GPU Win':{
+    return ['Python 3: oneDNN-GPU Win':{
       node(NODE_WINDOWS_GPU) {
         timeout(time: max_time, unit: 'MINUTES') {
           ws('workspace/ut-python-gpu') {
diff --git a/config/darwin.cmake b/config/darwin.cmake
index 1015a2f..d64379c 100644
--- a/config/darwin.cmake
+++ b/config/darwin.cmake
@@ -45,7 +45,7 @@ set(OPENCV_ROOT "" CACHE BOOL "OpenCV install path. Supports 
autodetection.")
 
 set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
 
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 
diff --git a/config/distribution/darwin_cpu.cmake 
b/config/distribution/darwin_cpu.cmake
index ddda2ca..c7ce88a 100644
--- a/config/distribution/darwin_cpu.cmake
+++ b/config/distribution/darwin_cpu.cmake
@@ -24,7 +24,7 @@ set(USE_BLAS "apple" CACHE STRING "BLAS Vendor")
 set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/darwin_cpu_mkl.cmake 
b/config/distribution/darwin_cpu_mkl.cmake
index f4e54a8..b49e203 100644
--- a/config/distribution/darwin_cpu_mkl.cmake
+++ b/config/distribution/darwin_cpu_mkl.cmake
@@ -25,7 +25,7 @@ set(BLA_STATIC ON CACHE BOOL "Use static libraries")
 set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/darwin_native.cmake 
b/config/distribution/darwin_native.cmake
index 4b256c6..dd6815d 100644
--- a/config/distribution/darwin_native.cmake
+++ b/config/distribution/darwin_native.cmake
@@ -24,7 +24,7 @@ set(USE_BLAS "apple" CACHE STRING "BLAS Vendor")
 set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP OFF CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN OFF CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN OFF CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cpu.cmake 
b/config/distribution/linux_cpu.cmake
index 9b8a979..cb0576f 100644
--- a/config/distribution/linux_cpu.cmake
+++ b/config/distribution/linux_cpu.cmake
@@ -23,7 +23,7 @@ set(USE_BLAS "open" CACHE STRING "BLAS Vendor")
 set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cpu_mkl.cmake 
b/config/distribution/linux_cpu_mkl.cmake
index 3f8dcfc..afeb3bb 100644
--- a/config/distribution/linux_cpu_mkl.cmake
+++ b/config/distribution/linux_cpu_mkl.cmake
@@ -25,7 +25,7 @@ set(BLA_STATIC ON CACHE BOOL "Use static libraries")
 set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu100.cmake 
b/config/distribution/linux_cu100.cmake
index 35ec5a3..78bcfae 100644
--- a/config/distribution/linux_cu100.cmake
+++ b/config/distribution/linux_cu100.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
 set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu101.cmake 
b/config/distribution/linux_cu101.cmake
index 80f522d..bbe3e9f 100644
--- a/config/distribution/linux_cu101.cmake
+++ b/config/distribution/linux_cu101.cmake
@@ -27,7 +27,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
 set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu102.cmake 
b/config/distribution/linux_cu102.cmake
index d580354..a01662a 100644
--- a/config/distribution/linux_cu102.cmake
+++ b/config/distribution/linux_cu102.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
 set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu110.cmake 
b/config/distribution/linux_cu110.cmake
index 0c239cb..1348da6 100644
--- a/config/distribution/linux_cu110.cmake
+++ b/config/distribution/linux_cu110.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
 set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu112.cmake 
b/config/distribution/linux_cu112.cmake
index 031d129..87da1ad 100644
--- a/config/distribution/linux_cu112.cmake
+++ b/config/distribution/linux_cu112.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
 set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_cu92.cmake 
b/config/distribution/linux_cu92.cmake
index 9466a52..a65a667 100644
--- a/config/distribution/linux_cu92.cmake
+++ b/config/distribution/linux_cu92.cmake
@@ -25,7 +25,7 @@ set(USE_CUDNN ON CACHE BOOL "Build with CUDNN support")
 set(USE_NCCL ON CACHE BOOL "Build with NCCL support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/distribution/linux_native.cmake 
b/config/distribution/linux_native.cmake
index a0900f3..0ea1816 100644
--- a/config/distribution/linux_native.cmake
+++ b/config/distribution/linux_native.cmake
@@ -23,7 +23,7 @@ set(USE_BLAS "open" CACHE STRING "BLAS Vendor")
 set(USE_CUDA OFF CACHE BOOL "Build with CUDA support")
 set(USE_OPENCV ON CACHE BOOL "Build with OpenCV support")
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
-set(USE_ONEDNN OFF CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN OFF CACHE BOOL "Build with oneDNN support")
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 set(USE_TVM_OP OFF CACHE BOOL "Enable use of TVM operator build system.")
 set(USE_SSE ON CACHE BOOL "Build with x86 SSE instruction support")
diff --git a/config/linux.cmake b/config/linux.cmake
index 0a0f2d9..ec02d9d 100644
--- a/config/linux.cmake
+++ b/config/linux.cmake
@@ -62,7 +62,7 @@ set(OPENCV_ROOT "" CACHE BOOL "OpenCV install path. Supports 
autodetection.")
 
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
 
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 
diff --git a/config/linux_gpu.cmake b/config/linux_gpu.cmake
index 42ebc11..53e096f 100644
--- a/config/linux_gpu.cmake
+++ b/config/linux_gpu.cmake
@@ -66,7 +66,7 @@ set(OPENCV_ROOT "" CACHE BOOL "OpenCV install path. Supports 
autodetection.")
 
 set(USE_OPENMP ON CACHE BOOL "Build with Openmp support")
 
-set(USE_ONEDNN ON CACHE BOOL "Build with ONEDNN support")
+set(USE_ONEDNN ON CACHE BOOL "Build with oneDNN support")
 
 set(USE_LAPACK ON CACHE BOOL "Build with lapack support")
 
diff --git a/docs/python_docs/python/tutorials/index.rst 
b/docs/python_docs/python/tutorials/index.rst
index e9a61be..7a6bae3 100644
--- a/docs/python_docs/python/tutorials/index.rst
+++ b/docs/python_docs/python/tutorials/index.rst
@@ -85,7 +85,7 @@ Performance
 
    .. card::
       :title: oneDNN
-      :link: performance/backend/mkldnn/index.html
+      :link: performance/backend/dnnl/index.html
 
       How to get the most from your CPU by using oneDNN.
 
diff --git a/docs/python_docs/python/tutorials/performance/backend/profiler.md 
b/docs/python_docs/python/tutorials/performance/backend/profiler.md
index a54892d..216722a 100644
--- a/docs/python_docs/python/tutorials/performance/backend/profiler.md
+++ b/docs/python_docs/python/tutorials/performance/backend/profiler.md
@@ -210,8 +210,8 @@ Let's zoom in to check the time taken by operators
 
 The above picture visualizes the sequence in which the operators were executed 
and the time taken by each operator.
 
-### Profiling ONEDNN Operators
-Reagrding ONEDNN operators, the library has already provided the internal 
profiling tool. Firstly, you need set `DNNL_VERBOSE=1` to enable internal 
profiler.
+### Profiling oneDNN Operators
+Reagrding oneDNN operators, the library has already provided the internal 
profiling tool. Firstly, you need set `DNNL_VERBOSE=1` to enable internal 
profiler.
 
 `$ DNNL_VERBOSE=1 python my_script.py > dnnl_verbose.log`
 
diff --git a/docs/static_site/src/_includes/get_started/cloud/cpu.md 
b/docs/static_site/src/_includes/get_started/cloud/cpu.md
index 810233f..6813f37 100644
--- a/docs/static_site/src/_includes/get_started/cloud/cpu.md
+++ b/docs/static_site/src/_includes/get_started/cloud/cpu.md
@@ -13,4 +13,4 @@ the [Download 
page](https://mxnet.apache.org/get_started/download).
 * **Amazon Web Services**
 - [AWS Deep Learning AMI](https://aws.amazon.com/machine-learning/amis/) - 
Preinstalled
 Conda environments
-for Python 2 or 3 with MXNet and ONEDNN.
+for Python 2 or 3 with MXNet and oneDNN.
diff --git a/docs/static_site/src/_includes/get_started/cloud/gpu.md 
b/docs/static_site/src/_includes/get_started/cloud/gpu.md
index 3a951ab..c21ba38 100644
--- a/docs/static_site/src/_includes/get_started/cloud/gpu.md
+++ b/docs/static_site/src/_includes/get_started/cloud/gpu.md
@@ -18,7 +18,7 @@ 
VM](https://docs.nvidia.com/ngc/ngc-alibaba-setup-guide/launching-nv-cloud-vm-co
 MXNet models
 - [AWS Deep Learning AMI](https://aws.amazon.com/machine-learning/amis/) - 
Preinstalled
 Conda environments
-for Python 2 or 3 with MXNet, CUDA, cuDNN, ONEDNN, and AWS Elastic Inference
+for Python 2 or 3 with MXNet, CUDA, cuDNN, oneDNN, and AWS Elastic Inference
 - [Dynamic Training on
 AWS](https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws) -
 experimental manual EC2 setup or semi-automated CloudFormation setup
diff --git 
a/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md 
b/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
index 086e440..89fbfae 100644
--- 
a/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
+++ 
b/docs/static_site/src/pages/api/cpp/docs/tutorials/multi_threaded_inference.md
@@ -163,7 +163,7 @@ The above code outputs results for different threads and 
cleans up the thread sa
 
 1. Only operators tested with the existing model coverage are supported. Other 
operators and operator types (stateful operators, custom operators are not 
supported. Existing model coverage is as follows (this list will keep growing 
as we test more models with different model types):
 
-|Models Tested|ONEDNN|CUDNN|NO-CUDNN|
+|Models Tested|oneDNN|CUDNN|NO-CUDNN|
 | --- | --- | --- | --- |
 | imagenet1k-resnet-18 | Yes | Yes | Yes |
 | imagenet1k-resnet-152 | Yes | Yes | Yes |
diff --git a/docs/static_site/src/pages/api/faq/cloud.md 
b/docs/static_site/src/pages/api/faq/cloud.md
index 0b7498e..9668f4b 100644
--- a/docs/static_site/src/pages/api/faq/cloud.md
+++ b/docs/static_site/src/pages/api/faq/cloud.md
@@ -54,8 +54,8 @@ on how to connect to a Jupyter notebook running on an EC2 
instance.
 ### Set Up an EC2 GPU Instance from Scratch
 
 [Deep Learning Base 
AMIs](https://aws.amazon.com/marketplace/search/results?x=0&y=0&searchTerms=Deep+Learning+Base+AMI)
-provide a foundational image with NVIDIA CUDA, cuDNN, GPU drivers, Intel
-ONEDNN, Docker and Nvidia-Docker, etc. for deploying your own custom deep
+provide a foundational image with NVIDIA CUDA, cuDNN, GPU drivers, oneDNN,
+Docker and Nvidia-Docker, etc. for deploying your own custom deep
 learning environment. You may follow the [MXNet Build From Source
 instructions](https://mxnet.apache.org/get_started/build_from_source) easily on
 the Deep Learning Base AMIs.
diff --git a/docs/static_site/src/pages/api/faq/env_var.md 
b/docs/static_site/src/pages/api/faq/env_var.md
index 1ecd30f..dad481c 100644
--- a/docs/static_site/src/pages/api/faq/env_var.md
+++ b/docs/static_site/src/pages/api/faq/env_var.md
@@ -372,12 +372,12 @@ If ctypes is used, it must be 
`mxnet._ctypes.ndarray.NDArrayBase`.
 
 * MXNET_ONEDNN_ENABLED
   - Values: 0, 1 ```(default=1)```
-  - Flag to enable or disable ONEDNN accelerator. On by default.
-  - Only applies to mxnet that has been compiled with ONEDNN (```pip install 
mxnet``` or built from source with ```USE_ONEDNN=1```)
+  - Flag to enable or disable oneDNN accelerator. On by default.
+  - Only applies to mxnet that has been compiled with oneDNN (```pip install 
mxnet``` or built from source with ```USE_ONEDNN=1```)
 
 * MXNET_ONEDNN_CACHE_NUM
   - Values: Int ```(default=-1)```
-  - Flag to set num of elements that ONEDNN cache can hold. Default is -1 
which means cache size is unbounded. Should only be set if your model has 
variable input shapes, as cache size may grow unbounded. The number represents 
the number of items in the cache and is proportional to the number of layers 
that use ONEDNN and different input shape.
+  - Flag to set num of elements that oneDNN cache can hold. Default is -1 
which means cache size is unbounded. Should only be set if your model has 
variable input shapes, as cache size may grow unbounded. The number represents 
the number of items in the cache and is proportional to the number of layers 
that use oneDNN and different input shape.
 
 * MXNET_ONEDNN_FORCE_FC_AB_FORMAT
   - Values: 0, 1 ```(default=0)```
@@ -446,7 +446,7 @@ If ctypes is used, it must be 
`mxnet._ctypes.ndarray.NDArrayBase`.
 
 * MXNET_USE_ONEDNN_RNN
   - Values: 0(false) or 1(true) ```(default=1)```
-  - This variable controls whether to use the ONEDNN backend in fused RNN 
operator for CPU context. There are two fusion implementations of RNN operator 
in MXNet. The ONEDNN implementation has a better performance than the naive 
one, but the latter is more stable in the backward operation currently.
+  - This variable controls whether to use the oneDNN backend in fused RNN 
operator for CPU context. There are two fusion implementations of RNN operator 
in MXNet. The oneDNN implementation has a better performance than the naive 
one, but the latter is more stable in the backward operation currently.
 
 * MXNET_FC_TRUE_FP16
   - Values: 0(false) or 1(true) ```(default=0)```
diff --git a/docs/static_site/src/pages/api/faq/large_tensor_support.md 
b/docs/static_site/src/pages/api/faq/large_tensor_support.md
index 247720f..c7c3f74 100644
--- a/docs/static_site/src/pages/api/faq/large_tensor_support.md
+++ b/docs/static_site/src/pages/api/faq/large_tensor_support.md
@@ -141,9 +141,9 @@ Backward pass is partially supported and not completely 
tested, so it is conside
 
 Not supported:
 
-* GPU and ONEDNN. 
+* GPU and oneDNN. 
 * Windows, ARM or any operating system other than Ubuntu
-* Any permutation of MXNet wheel that contains ONEDNN. 
+* Any permutation of MXNet wheel that contains oneDNN. 
 * Other language bindings like Scala, Java, R,  and Julia.
 
 
diff --git a/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md 
b/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
index 1212524..3e6a74c 100644
--- a/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
+++ b/docs/static_site/src/pages/api/faq/tensor_inspector_tutorial.md
@@ -168,7 +168,7 @@ Notice: in `interactive_print()`, you could also do value 
dumping with command "
 
 ### Test Coverage and Limitations
 
-This utility has been tested on Mac and Ubuntu with and without CUDNN and 
ONEDNN. Supports for `Tensor`, `TBlob`, and `NDArray`, as well as for CPU and 
GPU have been manually tested. 
+This utility has been tested on Mac and Ubuntu with and without CUDNN and 
oneDNN. Supports for `Tensor`, `TBlob`, and `NDArray`, as well as for CPU and 
GPU have been manually tested. 
 
 Currently, this utility only supports non-empty tensors and tensors with known 
shapes i.e. `tb_.ndim() > 0`. Also, this utility only supports dense `NDArray` 
objects, i.e. when the type is `kDefaultStorage`. 
 
diff --git a/example/README.md b/example/README.md
index bd985a2..4e9023a 100644
--- a/example/README.md
+++ b/example/README.md
@@ -109,7 +109,7 @@ If your tutorial depends on specific packages, simply add 
them to this provision
 * [Kaggle 2nd national data science bowl](kaggle-ndsb2) - a tutorial for 
Kaggle Second Nation Data Science Bowl
 * [Multi-task Learning](multi-task) - how to use MXNet for multi-task learning
 * [Profiling](profiler) - generate profiling results in json files
-* [Quantization and Calibration Examples](quantization) - examples of 
quantizing a FP32 model to INT8 and performing low-precision inference with 
Intel ONEDNN on CPU or cuDNN on GPU
+* [Quantization and Calibration Examples](quantization) - examples of 
quantizing a FP32 model to INT8 and performing low-precision inference with 
oneDNN on CPU or cuDNN on GPU
 * [Recommender Systems](recommenders) - examples of how to build various kinds 
of recommender systems
 * [Restricted Boltzmann Machine](restricted-boltzmann-machine) - an example of 
the binary restricted Boltzmann machine learning MNIST
 * [Single Shot MultiBox Detector](ssd) - SSD object recognition example
diff --git a/example/quantization/README.md b/example/quantization/README.md
index 3370ada..fa060b9 100644
--- a/example/quantization/README.md
+++ b/example/quantization/README.md
@@ -20,11 +20,11 @@
 
 # Model Quantization with Calibration Examples
 
-This folder contains examples of quantizing a FP32 model with Intel® oneAPI 
Deep Neural Network Library (oneDNN) to (U)INT8 model.
+This folder contains examples of quantizing a FP32 model with oneAPI Deep 
Neural Network Library (oneDNN) to (U)INT8 model.
 
-<h2 id="1">Model Quantization with Intel® oneDNN</h2>
+<h2 id="1">Model Quantization with oneDNN</h2>
 
-Intel® oneDNN supports quantization with subgraph features on Intel® CPU 
Platform and can bring performance improvements on the [Intel® Xeon® Scalable 
Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html).
+oneDNN supports quantization with subgraph features on Intel® CPU Platform and 
can bring performance improvements on the [Intel® Xeon® Scalable 
Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html).
 
 ```
 usage: python imagenet_gen_qsym_onednn.py [-h] [--model MODEL] [--epoch EPOCH]
@@ -38,7 +38,7 @@ usage: python imagenet_gen_qsym_onednn.py [-h] [--model 
MODEL] [--epoch EPOCH]
                                           [--quantized-dtype {auto,int8,uint8}]
                                           [--quiet]
 
-Generate a calibrated quantized model from a FP32 model with Intel oneDNN 
support
+Generate a calibrated quantized model from a FP32 model with oneDNN support
 
 optional arguments:
   -h, --help            show this help message and exit
@@ -87,7 +87,7 @@ optional arguments:
   --quiet               suppress most of log
 ```
 
-A new benchmark script `launch_inference_onednn.sh` has been designed to 
launch performance benchmark for FP32 or INT8 image-classification models with 
Intel® oneDNN.
+A new benchmark script `launch_inference_onednn.sh` has been designed to 
launch performance benchmark for FP32 or INT8 image-classification models with 
oneDNN.
 ```
 usage: bash ./launch_inference_onednn.sh -s symbol_file [-b batch_size] [-iter 
iteraton] [-ins instance] [-c cores/instance] [-h]
 
diff --git a/example/quantization/imagenet_gen_qsym_onednn.py 
b/example/quantization/imagenet_gen_qsym_onednn.py
index c8e6709..65454a3 100644
--- a/example/quantization/imagenet_gen_qsym_onednn.py
+++ b/example/quantization/imagenet_gen_qsym_onednn.py
@@ -100,7 +100,7 @@ def get_exclude_symbols(model_name, exclude_first_conv):
 
 
 if __name__ == '__main__':
-    parser = argparse.ArgumentParser(description='Generate a calibrated 
quantized model from a FP32 model with Intel oneDNN support')
+    parser = argparse.ArgumentParser(description='Generate a calibrated 
quantized model from a FP32 model with oneDNN support')
     parser.add_argument('--model', type=str, default='resnet50_v1',
                         help='model to be quantized. If no-pretrained is set 
then'
                              'model must be provided to `model` directory in 
the same path'
diff --git a/include/mxnet/ndarray.h b/include/mxnet/ndarray.h
index 5e6af4d..0e7fee1 100644
--- a/include/mxnet/ndarray.h
+++ b/include/mxnet/ndarray.h
@@ -739,7 +739,7 @@ class NDArray {
    */
   explicit NDArray(const dnnl::memory::desc& md);
   /*
-   * Test if the data is stored in one of special DNNL format.
+   * Test if the data is stored in one of special DNNL formats.
    */
   bool IsDNNLData() const {
     return ptr_->IsDNNL();
diff --git a/src/c_api/c_api.cc b/src/c_api/c_api.cc
index d69db4e..0bc54bf 100644
--- a/src/c_api/c_api.cc
+++ b/src/c_api/c_api.cc
@@ -163,7 +163,7 @@ void CustomFComputeDispatcher(const std::string op_name,
   std::vector<size_t> in_verIDs, out_verIDs;
   std::vector<const char*> in_dev_type, out_dev_type;
   std::vector<int> in_dev_id, out_dev_id;
-  std::vector<NDArray> conv_mkl;  // converted NDArrays from DNNL format
+  std::vector<NDArray> conv_dnnl;  // converted NDArrays from DNNL format
 
   // Extra data for sparse inputs and outputs.
   std::vector<int> in_stypes(inputs.size(), 0), out_stypes(outputs.size(), 0);
@@ -179,8 +179,8 @@ void CustomFComputeDispatcher(const std::string op_name,
     // reorder data if in DNNL format
     if (in_nd->IsDNNLData()) {
       // convert from DNNL
-      conv_mkl.push_back(in_nd->Reorder2Default());
-      in_nd = &(conv_mkl.back());
+      conv_dnnl.push_back(in_nd->Reorder2Default());
+      in_nd = &(conv_dnnl.back());
     }
 #endif
     // pull out parts to pass over to library
diff --git a/src/ndarray/ndarray.cc b/src/ndarray/ndarray.cc
index cdbb764..8c955bd 100644
--- a/src/ndarray/ndarray.cc
+++ b/src/ndarray/ndarray.cc
@@ -603,7 +603,7 @@ void NDArray::Chunk::SetMKLMem(const mxnet::TShape& shape, 
int dtype) {
     for (size_t i = 0; i < dims.size(); i++)
       dims[i] = shape[i];
   } else {
-    LOG(FATAL) << "DNNL doesn't support " << shape.ndim() << " dimensions";
+    LOG(FATAL) << "oneDNN doesn't support " << shape.ndim() << " dimensions";
   }
   dnnl::memory::format_tag layout = dnnl::memory::format_tag::undef;
   switch (dims.size()) {
@@ -626,7 +626,7 @@ void NDArray::Chunk::SetMKLMem(const mxnet::TShape& shape, 
int dtype) {
       layout = dnnl::memory::format_tag::abcdef;
       break;
     default:
-      LOG(FATAL) << "Not implemented dimension (" << dims.size() << ") for 
DNNL";
+      LOG(FATAL) << "Not implemented dimension (" << dims.size() << ") for 
oneDNN";
   }
   dnnl::memory::desc data_md{dims, get_dnnl_type(dtype), layout};
   if (shandle.dptr == nullptr) {
@@ -639,7 +639,7 @@ void NDArray::Chunk::SetMKLMem(const mxnet::TShape& shape, 
int dtype) {
 
 const dnnl::memory* NDArray::GetDNNLData(const dnnl::memory::desc& desc) const 
{
   if (desc.get_size() != shape().Size() * GetTypeSize(dtype_)) {
-    LOG(FATAL) << "The size of NDArray doesn't match the requested DNNL memory 
desc";
+    LOG(FATAL) << "The size of NDArray doesn't match the requested oneDNN 
memory desc";
     return nullptr;
   }
   const dnnl::memory* mem  = GetDNNLData();
@@ -705,7 +705,7 @@ NDArray NDArray::Reorder2Default() const {
   if (!ptr_->dnnl_mem_->IsDNNL())
     return *this;
 
-  // create new ndarray from  dnnl layout
+  // create new ndarray from dnnl layout
   dnnl::memory::desc from_desc = ptr_->dnnl_mem_->GetDesc();
   mxnet::TShape tshape(from_desc.data.ndims, -1);
   for (int i = 0; i < from_desc.data.ndims; i++)
@@ -863,7 +863,7 @@ void NDArray::CopyFrom(const dnnl::memory& mem) {
     return;
 
   CHECK(mem.get_desc().get_size() == shape().Size() * GetTypeSize(dtype_))
-      << "The size of NDArray doesn't match the requested DNNL memory desc";
+      << "The size of NDArray doesn't match the requested oneDNN memory desc";
   // If this array uses DNNL layout, we have to make sure it's not a view.
   // Otherwise, we'll have to change the layout inside the array.
 
@@ -876,8 +876,8 @@ void NDArray::CopyFrom(const dnnl::memory& mem) {
 
 dnnl::memory* NDArray::CreateDNNLData(const dnnl::memory::desc& desc) {
   if (desc.get_size() != shape().Size() * GetTypeSize(dtype_)) {
-    LOG(FATAL) << "The size of NDArray doesn't match the requested DNNL memory 
desc. "
-               << "DNNL memory requests for " << desc.get_size() << " bytes, 
but got "
+    LOG(FATAL) << "The size of NDArray doesn't match the requested oneDNN 
memory desc. "
+               << "oneDNN memory requests for " << desc.get_size() << " bytes, 
but got "
                << shape().Size() * GetTypeSize(dtype_) << " bytes from 
NDArray";
     return nullptr;
   }
@@ -937,7 +937,7 @@ void NDArray::SetTBlob() const {
   auto stype          = storage_type();
   if (stype == kDefaultStorage) {
 #if MXNET_USE_ONEDNN == 1
-    CHECK(!IsDNNLData()) << "We can't generate TBlob for DNNL data. "
+    CHECK(!IsDNNLData()) << "We can't generate TBlob for oneDNN data. "
                          << "Please use Reorder2Default() to generate a new 
NDArray first";
 #endif
     dptr += byte_offset_;
diff --git a/src/operator/contrib/batch_norm_relu.cc 
b/src/operator/contrib/batch_norm_relu.cc
index d223c65..e15bcbe 100644
--- a/src/operator/contrib/batch_norm_relu.cc
+++ b/src/operator/contrib/batch_norm_relu.cc
@@ -158,7 +158,7 @@ void BatchNormWithReLUComputeExCPU(const nnvm::NodeAttrs& 
attrs,
     });
     return;
   }
-  LOG(FATAL) << "BatchNormWithReLU operator only supports DNNL Backend.";
+  LOG(FATAL) << "BatchNormWithReLU operator only supports oneDNN Backend.";
 }
 
 void BatchNormWithReLUGradComputeExCPU(const nnvm::NodeAttrs& attrs,
@@ -174,7 +174,7 @@ void BatchNormWithReLUGradComputeExCPU(const 
nnvm::NodeAttrs& attrs,
     DNNLBatchNormBackward<float>(attrs, ctx, inputs, req, outputs, fuse_relu);
     return;
   }
-  LOG(FATAL) << "BatchNormWithReLU operator only supports DNNL Backend.";
+  LOG(FATAL) << "BatchNormWithReLU operator only supports oneDNN Backend.";
 }
 #endif
 
diff --git a/src/operator/nn/dnnl/dnnl_base-inl.h 
b/src/operator/nn/dnnl/dnnl_base-inl.h
index 3ec2e32..7951569 100644
--- a/src/operator/nn/dnnl/dnnl_base-inl.h
+++ b/src/operator/nn/dnnl/dnnl_base-inl.h
@@ -225,7 +225,7 @@ static inline dnnl::memory::data_type get_dnnl_type(int 
dtype) {
     case mshadow::kUint8:
       return dnnl::memory::data_type::u8;
     default:
-      LOG(FATAL) << "unknown type for DNNL :" << static_cast<int>(dtype);
+      LOG(FATAL) << "unknown type for oneDNN :" << static_cast<int>(dtype);
       return dnnl::memory::data_type::undef;
   }
 }
@@ -258,7 +258,7 @@ static inline int get_mxnet_type(dnnl_data_type_t dtype) {
     case dnnl::memory::data_type::u8:
       return mshadow::kUint8;
     default:
-      LOG(FATAL) << "unknown DNNL type";
+      LOG(FATAL) << "unknown oneDNN data type";
       return mshadow::kFloat32;
   }
 }
@@ -321,7 +321,7 @@ inline static dnnl::memory::desc GetWeightDesc(const 
NDArray& arr,
   } else {
     const auto ndim = arr.shape().ndim();
     CHECK((ndim == 3) || (ndim == 4) || (ndim == 5))
-        << "DNNL weight currently supports 3d or 4d or 5d layout";
+        << "oneDNN weight currently supports 3d or 4d or 5d layout";
     auto tz = dnnl::memory::dims{0};
     int N = 0, C = 1, H = 2, W = 3;
     int D = -1;
diff --git a/src/operator/nn/dnnl/dnnl_base.cc 
b/src/operator/nn/dnnl/dnnl_base.cc
index adcd8f2..73e9225 100644
--- a/src/operator/nn/dnnl/dnnl_base.cc
+++ b/src/operator/nn/dnnl/dnnl_base.cc
@@ -76,8 +76,8 @@ dnnl::memory* TmpMemMgr::Alloc(const dnnl::memory::desc& md) {
     // the space by itself. Thus, we just let it continue for estimating the 
maximum
     // required space size. It will be allocated at next call.
     if (this->curr_mem && dmlc::GetEnv("MXNET_ONEDNN_DEBUG", false)) {
-      LOG(WARNING) << "DNNL debug message: The rest of the temporary space is 
not "
-                   << "adequate for allocating " << md.get_size() << " bytes. 
Thus, DNNL "
+      LOG(WARNING) << "oneDNN debug message: The rest of the temporary space 
is not "
+                   << "adequate for allocating " << md.get_size() << " bytes. 
Thus, oneDNN "
                    << "allocate the space by itself.";
     }
     dnnl_mem_ptr ret(new dnnl::memory(md, CpuEngine::Get()->get_engine()));
@@ -330,7 +330,7 @@ dnnl_format_tag_t GetDefaultFormat(int num_dims) {
     case 6:
       return dnnl_abcdef;
     default:
-      LOG(FATAL) << "Not implemented dimension (" << num_dims << ") for DNNL";
+      LOG(FATAL) << "Not implemented dimension (" << num_dims << ") for 
oneDNN";
       return dnnl_format_tag_undef;
   }
 }
diff --git a/src/operator/nn/dnnl/dnnl_batch_norm-inl.h 
b/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
index f7dc97b..3902b2e 100644
--- a/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
+++ b/src/operator/nn/dnnl/dnnl_batch_norm-inl.h
@@ -223,7 +223,7 @@ void DNNLBatchNormForward(const nnvm::NodeAttrs& attrs,
       workspace                = &outputs[3];
       auto engine              = CpuEngine::Get()->get_engine();
       if (workspace == nullptr) {
-        LOG(FATAL) << "DNNL BatchNorm: incorrect workspace input";
+        LOG(FATAL) << "oneDNN BatchNorm: incorrect workspace input";
       }
       auto ws = std::make_shared<dnnl::memory>(
           fwd.GetPd().workspace_desc(), engine, 
workspace->GetDNNLData()->get_data_handle());
@@ -257,7 +257,7 @@ void DNNLBatchNormForward(const nnvm::NodeAttrs& attrs,
       }
     }
   } else {  // no input gamma and beta
-    LOG(FATAL) << "DNNL batch normalization: should not reach here ...";
+    LOG(FATAL) << "oneDNN batch normalization: should not reach here ...";
   }
 }
 
@@ -478,7 +478,7 @@ void DNNLBatchNormBackward(const nnvm::NodeAttrs& attrs,
       }
     }
   } else {
-    LOG(FATAL) << "DNNL batch normalization backward: should not reach here 
...";
+    LOG(FATAL) << "oneDNN batch normalization backward: should not reach here 
...";
   }
 }
 }  // namespace op
diff --git a/src/operator/nn/dnnl/dnnl_convolution.cc 
b/src/operator/nn/dnnl/dnnl_convolution.cc
index 7910f65..314bc62 100644
--- a/src/operator/nn/dnnl/dnnl_convolution.cc
+++ b/src/operator/nn/dnnl/dnnl_convolution.cc
@@ -84,7 +84,7 @@ std::shared_ptr<dnnl::convolution_forward::primitive_desc> 
GetConvFwdImpl(
     padding[1] = param.conv_param.pad[1];
     padding[2] = param.conv_param.pad[2];
   } else {
-    LOG(FATAL) << "Unexpected DNNL Conv kernel size " << 
param.conv_param.kernel.ndim()
+    LOG(FATAL) << "Unexpected oneDNN Conv kernel size " << 
param.conv_param.kernel.ndim()
                << ", supporting only 1 or 2 or 3.";
   }
   dnnl::primitive_attr attr;
@@ -168,7 +168,7 @@ std::shared_ptr<dnnl::convolution_forward::primitive_desc> 
GetConvFwdImpl(
       dilates[1] = param.conv_param.dilate[1] - 1;
       dilates[2] = param.conv_param.dilate[2] - 1;
     } else {
-      LOG(FATAL) << "Unexpected DNNL Conv dilate size " << 
param.conv_param.dilate.ndim()
+      LOG(FATAL) << "Unexpected oneDNN Conv dilate size " << 
param.conv_param.dilate.ndim()
                  << ", supporting only 1 or 2 or 3.";
     }
     if (bias_md_ptr == nullptr) {
@@ -235,7 +235,7 @@ static 
std::shared_ptr<dnnl::convolution_backward_data::primitive_desc> GetConvB
     padding[1] = param.pad[1];
     padding[2] = param.pad[2];
   } else {
-    LOG(FATAL) << "Unexpected DNNL Conv kernel size " << param.kernel.ndim()
+    LOG(FATAL) << "Unexpected oneDNN Conv kernel size " << param.kernel.ndim()
                << ", supporting only 1 or 2 or 3.";
   }
 
@@ -278,7 +278,7 @@ static 
std::shared_ptr<dnnl::convolution_backward_data::primitive_desc> GetConvB
       dilates[1] = param.dilate[1] - 1;
       dilates[2] = param.dilate[2] - 1;
     } else {
-      LOG(FATAL) << "Unexpected DNNL Conv dilate size " << param.dilate.ndim()
+      LOG(FATAL) << "Unexpected oneDNN Conv dilate size " << 
param.dilate.ndim()
                  << ", supporting only 1 or 2 or 3.";
     }
     dnnl::convolution_backward_data::desc 
desc(dnnl::algorithm::convolution_direct,
@@ -331,7 +331,7 @@ static 
std::shared_ptr<dnnl::convolution_backward_weights::primitive_desc> GetCo
     padding[1] = param.pad[1];
     padding[2] = param.pad[2];
   } else {
-    LOG(FATAL) << "Unexpected DNNL Conv kernel size " << param.kernel.ndim()
+    LOG(FATAL) << "Unexpected oneDNN Conv kernel size " << param.kernel.ndim()
                << ", supporting only 1 or 2 or 3.";
   }
 
@@ -385,7 +385,7 @@ static 
std::shared_ptr<dnnl::convolution_backward_weights::primitive_desc> GetCo
       dilates[1] = param.dilate[1] - 1;
       dilates[2] = param.dilate[2] - 1;
     } else {
-      LOG(FATAL) << "Unexpected DNNL Conv dilate size " << param.dilate.ndim()
+      LOG(FATAL) << "Unexpected oneDNN Conv dilate size " << 
param.dilate.ndim()
                  << ", supporting only 1 or 2 or 3.";
     }
     if (bias == nullptr) {
diff --git a/src/operator/nn/dnnl/dnnl_fully_connected.cc 
b/src/operator/nn/dnnl/dnnl_fully_connected.cc
index 7879497..eca90b7 100644
--- a/src/operator/nn/dnnl/dnnl_fully_connected.cc
+++ b/src/operator/nn/dnnl/dnnl_fully_connected.cc
@@ -65,7 +65,8 @@ dnnl::inner_product_forward::primitive_desc 
GetFCFwdImpl(const DNNLFCFullParam&
       return dnnl::inner_product_forward::primitive_desc(desc, attr, engine);
     } catch (dnnl::error& e) {
       if (e.status == dnnl_unimplemented && full_param.dnnl_param.quantized) {
-        LOG(ERROR) << "AVX512-BW support or DNNL v0.18 is required for INT8 
fully_connected.";
+        LOG(ERROR)
+            << "AVX512-BW support or oneDNN v0.18 or later is required for 
INT8 fully_connected.";
       } else {
         LOG(ERROR) << e.message;
       }
diff --git a/src/operator/nn/dnnl/dnnl_layer_norm.cc 
b/src/operator/nn/dnnl/dnnl_layer_norm.cc
index 2e720d0..2c938db 100644
--- a/src/operator/nn/dnnl/dnnl_layer_norm.cc
+++ b/src/operator/nn/dnnl/dnnl_layer_norm.cc
@@ -112,7 +112,7 @@ inline dnnl::memory::desc GetMeanVarDesc(const 
dnnl::memory::data_type& dtype,
 }
 
 inline dnnl::memory GetScaleShiftMem(const NDArray& gamma, const NDArray& 
beta) {
-  // OneDNN takes gamma and beta as one SCALE_SHIFT tensor when both scale and 
shift are used. In
+  // oneDNN takes gamma and beta as one SCALE_SHIFT tensor when both scale and 
shift are used. In
   // mxnet scale is called gamma and shift is called beta.
   constexpr size_t gammaAndBeta = 2;
   CHECK_EQ(gamma.shape()[0], beta.shape()[0]);
diff --git a/src/operator/nn/dnnl/dnnl_pooling.cc 
b/src/operator/nn/dnnl/dnnl_pooling.cc
index 252bf05..4452951 100644
--- a/src/operator/nn/dnnl/dnnl_pooling.cc
+++ b/src/operator/nn/dnnl/dnnl_pooling.cc
@@ -48,7 +48,7 @@ void DNNLPoolingFwd::Init(const mxnet::NDArray& input,
   if (alg_kind != dnnl::algorithm::pooling_max && alg_kind != 
dnnl::algorithm::pooling_avg &&
       alg_kind != dnnl::algorithm::pooling_avg_include_padding &&
       alg_kind != dnnl::algorithm::pooling_avg_exclude_padding) {
-    LOG(FATAL) << "DNNL Pooling: algorithm is not supported";
+    LOG(FATAL) << "oneDNN Pooling: algorithm is not supported";
   }
 
   dnnl::prop_kind prop = dnnl::prop_kind::forward_scoring;
@@ -56,7 +56,7 @@ void DNNLPoolingFwd::Init(const mxnet::NDArray& input,
     prop = dnnl::prop_kind::forward_training;
   }
   if (is_train && prop == dnnl::prop_kind::forward_scoring) {
-    LOG(INFO) << "DNNL Pooling: training with prop_kind is forward_scoring";
+    LOG(INFO) << "oneDNN Pooling: training with prop_kind is forward_scoring";
   }
 
   const auto fwd_desc =
@@ -87,7 +87,7 @@ void DNNLPoolingFwd::Execute(const NDArray& in_data,
     auto engine = CpuEngine::Get()->get_engine();
 
     if (workspace == nullptr) {
-      LOG(FATAL) << "DNNL Pooling: incorrect workspace input";
+      LOG(FATAL) << "oneDNN Pooling: incorrect workspace input";
     }
 
     auto ws = std::make_shared<dnnl::memory>(
@@ -99,7 +99,7 @@ void DNNLPoolingFwd::Execute(const NDArray& in_data,
     CommitOutput(out_data, output_mem_t_);
     DNNLStream::Get()->Submit();
   } else {
-    LOG(FATAL) << "DNNL Pooling: forward primitive is nullptr";
+    LOG(FATAL) << "oneDNN Pooling: forward primitive is nullptr";
   }
 }
 
@@ -116,7 +116,7 @@ dnnl::algorithm GetDNNLPoolAlgo(const PoolingParam& param) {
       }
       break;
     default:
-      LOG(FATAL) << "DNNL Pooling: Unknown pooling method.";
+      LOG(FATAL) << "oneDNN Pooling: Unknown pooling method.";
       return dnnl::algorithm::pooling_max;
   }
 }
diff --git a/src/operator/nn/dnnl/dnnl_rnn.cc b/src/operator/nn/dnnl/dnnl_rnn.cc
index 051de78..22b9e27 100644
--- a/src/operator/nn/dnnl/dnnl_rnn.cc
+++ b/src/operator/nn/dnnl/dnnl_rnn.cc
@@ -145,7 +145,7 @@ DNNLRnnFullParam DNNLRnnFullParamParser(const RNNParam& 
rnn_param,
 void DNNLRnnMemMgr::Init(dim_t size, const Context& ctx) {
   workspace_ = NDArray(TShape({size}), ctx, false, mshadow::kUint8);
   if (workspace_.data().dptr_ == nullptr)
-    LOG(FATAL) << "DNNL RNN operator memory allocation error.";
+    LOG(FATAL) << "oneDNN RNN operator memory allocation error.";
   curr_mem  = static_cast<char*>(workspace_.data().dptr_);
   mem_size  = size;
   curr_size = size;
@@ -1265,7 +1265,7 @@ void DNNLRnnOp::Backward(const OpContext& ctx,
   }
   // Fetch weights, src and dst from Forward layer
   if (bwd_vec_.size() != fwd_trn_vec_.size())
-    LOG(FATAL) << "DNNL RNN fusion error.";
+    LOG(FATAL) << "oneDNN RNN fusion error.";
   for (size_t lyr = 0; lyr < bwd_vec_.size(); ++lyr) {
     bwd_vec_.at(lyr).FetchDataWeightsMem(fwd_trn_vec_.at(lyr));
     bwd_vec_.at(lyr).SetWeightsGradsMem();
diff --git a/src/operator/quantization/dnnl/dnnl_quantize-inl.h 
b/src/operator/quantization/dnnl/dnnl_quantize-inl.h
index 7a53ab1..13f2e1e 100644
--- a/src/operator/quantization/dnnl/dnnl_quantize-inl.h
+++ b/src/operator/quantization/dnnl/dnnl_quantize-inl.h
@@ -58,7 +58,7 @@ static void DNNLQuantizeComputeKer(const 
std::vector<NDArray>& inputs,
     *outputs[1].data().dptr<float>() = -real_range;
     *outputs[2].data().dptr<float>() = real_range;
   } else {
-    LOG(FATAL) << "dnnl quantize op only supports int8 and uint8 as output 
type";
+    LOG(FATAL) << "oneDNN quantize op only supports int8 and uint8 as output 
type";
   }
   float scale = quantized_range / real_range;
   dnnl::primitive_attr attr;
@@ -101,7 +101,7 @@ static void DNNLQuantizeCompute(const nnvm::NodeAttrs& 
attrs,
   } else if (param.out_type == mshadow::kInt8) {
     DNNLQuantizeComputeKer<float, int8_t>(inputs, outputs, param, req);
   } else {
-    LOG(FATAL) << "dnnl quantize op only supports int8 and uint8 as output 
type";
+    LOG(FATAL) << "oneDNN quantize op only supports int8 and uint8 as output 
type";
   }
 }
 
diff --git a/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h 
b/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
index 1acc8a5..6181132 100644
--- a/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
+++ b/src/operator/quantization/dnnl/dnnl_quantize_v2-inl.h
@@ -128,7 +128,7 @@ void SgDNNLQuantizeOperator::Forward(const OpContext& ctx,
       *outputs[1].data().dptr<float>() = -real_range;
       *outputs[2].data().dptr<float>() = real_range;
     } else {
-      LOG(FATAL) << "dnnl quantize op only supports int8 and uint8 as output 
type";
+      LOG(FATAL) << "oneDNN quantize op only supports int8 and uint8 as output 
type";
     }
 
     if (!initalized_) {
diff --git a/src/operator/quantization/dnnl/dnnl_requantize-inl.h 
b/src/operator/quantization/dnnl/dnnl_requantize-inl.h
index 5eea9dc..2dc61d6 100644
--- a/src/operator/quantization/dnnl/dnnl_requantize-inl.h
+++ b/src/operator/quantization/dnnl/dnnl_requantize-inl.h
@@ -142,7 +142,7 @@ static void DNNLRequantizeForward(const nnvm::NodeAttrs& 
attrs,
   } else if (out_type == mshadow::kInt8) {
     DNNLRequantizeForwardKer<int8_t>(attrs, ctx, inputs, req, outputs, 
real_range);
   } else {
-    LOG(FATAL) << "dnnl requantize op only supports int8 and uint8 as output 
type";
+    LOG(FATAL) << "oneDNN requantize op only supports int8 and uint8 as output 
type";
   }
 }
 
diff --git a/src/operator/quantization/quantized_batch_norm.cc 
b/src/operator/quantization/quantized_batch_norm.cc
index 9b1fd2a..009d6be 100644
--- a/src/operator/quantization/quantized_batch_norm.cc
+++ b/src/operator/quantization/quantized_batch_norm.cc
@@ -70,7 +70,7 @@ bool QuantizedBatchNormType(const nnvm::NodeAttrs& attrs,
 
 #if MXNET_USE_ONEDNN == 1
   CHECK(in_type->at(0) == mshadow::kInt8 || in_type->at(0) == mshadow::kUint8)
-      << "QuantizedBatchNorm with DNNL backend only supports int8/uint8 input, 
while "
+      << "QuantizedBatchNorm with oneDNN backend only supports int8/uint8 
input, while "
       << in_type->at(0) << " is given.";
 #else
   TYPE_ASSIGN_CHECK(*in_type, 0, mshadow::kInt8);
diff --git a/src/operator/quantization/quantized_conv.cc 
b/src/operator/quantization/quantized_conv.cc
index cd93ceb..95fbd3b 100644
--- a/src/operator/quantization/quantized_conv.cc
+++ b/src/operator/quantization/quantized_conv.cc
@@ -41,7 +41,7 @@ bool QuantizedConvShape(const nnvm::NodeAttrs& attrs,
   if (param.layout.has_value()) {
 #if MXNET_USE_ONEDNN == 1
     CHECK(param.layout.value() == mshadow::kNCHW || param.layout.value() == 
mshadow::kNCDHW)
-        << "dnnl quantized_conv now supports NCHW or NCDHW for now";
+        << "oneDNN quantized_conv only supports NCHW and NCDHW for now";
 #else
     CHECK_EQ(param.layout.value(), mshadow::kNCHW) << "quantized_conv only 
supports NCHW for now";
 #endif
@@ -55,9 +55,9 @@ bool QuantizedConvShape(const nnvm::NodeAttrs& attrs,
 
 #if MXNET_USE_ONEDNN == 1
   CHECK(kernel_ndims == 2U || kernel_ndims == 3U)
-      << "dnnl quantized_conv only supports 2d or 3d kernel for now";
+      << "oneDNN quantized_conv only supports 2d and 3d kernel for now";
   CHECK(data_ndims == 4U || data_ndims == 5U)
-      << "dnnl quantized_conv only supports 4d or 5d layout for now";
+      << "oneDNN quantized_conv only supports 4d and 5d layout for now";
 #else
   CHECK_EQ(kernel_ndims, 2U) << "quantized_conv only supports 2D convolution 
for now";
   CHECK(param.dilate.ndim() == 0U || param.dilate.Size() == 1U)
diff --git a/src/operator/quantization/quantized_elemwise_add.cc 
b/src/operator/quantization/quantized_elemwise_add.cc
index b314e9e..262f6e8 100644
--- a/src/operator/quantization/quantized_elemwise_add.cc
+++ b/src/operator/quantization/quantized_elemwise_add.cc
@@ -84,8 +84,8 @@ void QuantizedElemwiseAddForward(const nnvm::NodeAttrs& attrs,
                                  const std::vector<TBlob>& in_data,
                                  const std::vector<OpReqType>& req,
                                  const std::vector<TBlob>& out_data) {
-  LOG(FATAL) << "Not supported for MXNet built without DNNL. "
-                "Please install DNNL enabled MXNet.";
+  LOG(FATAL) << "Not supported for MXNet built without oneDNN. "
+                "Please install oneDNN enabled MXNet.";
 }
 
 NNVM_REGISTER_OP(_contrib_quantized_elemwise_add)
diff --git a/src/operator/quantization/quantized_pooling.cc 
b/src/operator/quantization/quantized_pooling.cc
index 14ec43296..8736d03 100644
--- a/src/operator/quantization/quantized_pooling.cc
+++ b/src/operator/quantization/quantized_pooling.cc
@@ -44,12 +44,12 @@ bool QuantizedPoolingShape(const nnvm::NodeAttrs& attrs,
 
 #if MXNET_USE_ONEDNN == 1
   CHECK(data_ndims == 4U || data_ndims == 5U)
-      << "DNNL QuantizedPoolingOp only supports 4D/5D layout yet, input should 
be 4D in"
+      << "oneDNN QuantizedPoolingOp only supports 4D/5D layout for now, input 
should be 4D in "
       << "(batch, channel, y, x) or 5D in (batch, channel, d, y, x)";
   CHECK(layout == mshadow::kNCHW || layout == mshadow::kNCDHW)
-      << "DNNL QuantizedPoolingOp only supports NCHW/NCDHW layout for now, saw 
" << layout;
+      << "oneDNN QuantizedPoolingOp only supports NCHW/NCDHW layout for now, 
saw " << layout;
   CHECK(kernel_ndims == 2U || kernel_ndims == 3U)
-      << "DNNL QuantizedPoolingOp only supports 2D/3D pooling for now, saw" << 
kernel_ndims;
+      << "oneDNN QuantizedPoolingOp only supports 2D/3D pooling for now, saw" 
<< kernel_ndims;
 #else
   CHECK_EQ(data_ndims, 4U) << "quantized_pooling: Input data should be 4D in "
                            << "(batch, channel, y, x)";
diff --git a/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h 
b/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
index d2f33aa..c4dee3e 100644
--- a/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
+++ b/src/operator/subgraph/dnnl/dnnl_batch_dot_property.h
@@ -50,7 +50,7 @@ class SgDNNLBatchDotSelector : public SubgraphSelector {
 class SgDNNLBatchDotProperty : public SubgraphProperty {
  public:
   static SubgraphPropertyPtr Create() {
-    static const std::string& name = "DNNL Batch Dot optimization pass";
+    static const std::string& name = "oneDNN Batch Dot optimization pass";
     auto property                  = 
std::make_shared<SgDNNLBatchDotProperty>();
     property->SetAttr<std::string>("property_name", name);
     property->SetAttr<bool>("inference_only", true);
diff --git a/src/operator/subgraph/dnnl/dnnl_conv.cc 
b/src/operator/subgraph/dnnl/dnnl_conv.cc
index bc1f6fd..7bc1b24 100644
--- a/src/operator/subgraph/dnnl/dnnl_conv.cc
+++ b/src/operator/subgraph/dnnl/dnnl_conv.cc
@@ -321,7 +321,7 @@ void SgDNNLConvOperator::Forward(const OpContext& ctx,
       if (dnnl_param.with_act &&
           full_conv_param.act_param.alg == 
dnnl::algorithm::eltwise_bounded_relu) {
         if (dnnl_param.with_sum) {
-          LOG(ERROR) << "dnnl doesn't support conv + relu + sum fusion yet.";
+          LOG(ERROR) << "oneDNN doesn't support conv + relu + sum fusion yet.";
           full_conv_param.act_param.alpha *= output_scale;
         } else {
           // For conv+relu6 without sum, we don't need post_ops as 
output_scale can do the cut off.
diff --git a/src/operator/subgraph/dnnl/dnnl_fc.cc 
b/src/operator/subgraph/dnnl/dnnl_fc.cc
index 44c1a35..51989ca 100644
--- a/src/operator/subgraph/dnnl/dnnl_fc.cc
+++ b/src/operator/subgraph/dnnl/dnnl_fc.cc
@@ -56,7 +56,7 @@ class SgDNNLFCOp {
                 const std::vector<NDArray>& inputs,
                 const std::vector<OpReqType>& req,
                 const std::vector<NDArray>& outputs) {
-    LOG(FATAL) << "Not implemented: subgraph dnnl fully connected only 
supports "
+    LOG(FATAL) << "Not implemented: subgraph oneDNN fully connected only 
supports "
                   "inference computation.";
   }
 
diff --git a/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h 
b/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
index 6fbd97f..6c384a1 100644
--- a/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
+++ b/src/operator/subgraph/dnnl/dnnl_matmul_post_quantize_property.h
@@ -136,7 +136,7 @@ class SgDNNLMatmulPostQuantizeProperty : public 
SubgraphProperty {
   }
 
   static SubgraphPropertyPtr Create() {
-    static const std::string& name = "DNNL Matmul post-quantization 
optimization pass";
+    static const std::string& name = "oneDNN Matmul post-quantization 
optimization pass";
     auto property                  = 
std::make_shared<SgDNNLMatmulPostQuantizeProperty>();
     property->SetAttr<std::string>("property_name", name);
     property->SetAttr<bool>("inference_only", true);
diff --git a/src/operator/tensor/cast_storage-inl.h 
b/src/operator/tensor/cast_storage-inl.h
index 7c6f83a..ee32915 100644
--- a/src/operator/tensor/cast_storage-inl.h
+++ b/src/operator/tensor/cast_storage-inl.h
@@ -445,8 +445,8 @@ inline bool CastStorageInferStorageType(const 
nnvm::NodeAttrs& attrs,
     // dns -> dns
     DispatchMode mode = DispatchMode::kFCompute;
 #if MXNET_USE_ONEDNN == 1
-    // If we use DNNL and the arrays are in CPU memory, the array may store
-    // DNNL layout, we should convert its layout explicitly.
+    // If we use oneDNN and the arrays are in CPU memory, the array may store
+    // oneDNN layout, we should convert its layout explicitly.
     if (dev_mask == kCPU)
       mode = DispatchMode::kFComputeEx;
 #endif
diff --git a/src/operator/tensor/elemwise_unary_op.h 
b/src/operator/tensor/elemwise_unary_op.h
index f516a78..5d23c98 100644
--- a/src/operator/tensor/elemwise_unary_op.h
+++ b/src/operator/tensor/elemwise_unary_op.h
@@ -399,8 +399,8 @@ class UnaryOp : public OpBase {
         });
       } break;
       case kWriteInplace:
-// cannot check if ptrs are the same for DNNL because we may have
-// created copies of input when reordering. WriteInPlace will still write to 
original array
+// cannot check if ptrs are the same for oneDNN because we may have created
+// copies of input when reordering. WriteInPlace will still write to original 
array
 #if MXNET_USE_ONEDNN == 0
         CHECK_EQ(inputs[0].dptr_, outputs[0].dptr_);
 #endif
diff --git a/tests/cpp/include/test_dnnl.h b/tests/cpp/include/test_dnnl.h
index 359a0f2..7172b0b 100644
--- a/tests/cpp/include/test_dnnl.h
+++ b/tests/cpp/include/test_dnnl.h
@@ -400,17 +400,17 @@ inline std::vector<NDArrayAttrs> GetTestInputArrays(int 
types                = A
       // Type 2, 3.
       arr = NDArray(shape, Context());
       if (shape.ndim() == md.data.ndims && IsSameShape(md, shape) && types & 
ArrayTypes::DNNL) {
-        desc_str = "DNNL NDArray";
+        desc_str = "oneDNN NDArray";
         InitDNNLArray(&arr, md, rand, max);
         in_arrs.emplace_back(arr, desc_str);
       } else if (shape.ndim() == md.data.ndims && !IsSameShape(md, shape) &&
                  types & ArrayTypes::DNNLDiffShape) {
-        desc_str = "DNNL NDArray with different shape";
+        desc_str = "oneDNN NDArray with different shape";
         InitDNNLArray(&arr, md, rand, max);
         in_arrs.emplace_back(arr, desc_str);
       } else if (shape.ndim() != md.data.ndims && types & 
ArrayTypes::DNNLDiffDim) {
         std::stringstream ss;
-        ss << "DNNL NDArray with different dim " << shape.ndim() << "/" << 
md.data.ndims;
+        ss << "oneDNN NDArray with different dim " << shape.ndim() << "/" << 
md.data.ndims;
         desc_str = ss.str();
         InitDNNLArray(&arr, md, rand, max);
         in_arrs.emplace_back(arr, desc_str);
@@ -420,17 +420,17 @@ inline std::vector<NDArrayAttrs> GetTestInputArrays(int 
types                = A
       arr = NDArray(shape, Context());
       if (shape.ndim() == md.data.ndims && IsSameShape(md, shape) &&
           types & ArrayTypes::DNNLReshaped) {
-        desc_str = "Reshaped DNNL NDArray";
+        desc_str = "Reshaped oneDNN NDArray";
         InitDNNLArray(&arr, md, rand, max);
         in_arrs.emplace_back(arr.Slice(slice_amount, arr.shape()[0] - 
slice_amount), desc_str);
       } else if (shape.ndim() == md.data.ndims && !IsSameShape(md, shape) &&
                  types & ArrayTypes::DNNLReshapedDiffShape) {
-        desc_str = "Reshaped DNNL NDArray with different shape";
+        desc_str = "Reshaped oneDNN NDArray with different shape";
         InitDNNLArray(&arr, md, rand, max);
         in_arrs.emplace_back(arr.Slice(slice_amount, arr.shape()[0] - 
slice_amount), desc_str);
       } else if (shape.ndim() != md.data.ndims && types & 
ArrayTypes::DNNLReshapedDiffDim) {
         std::stringstream ss;
-        ss << "DNNL NDArray with different dim " << shape.ndim() << "/" << 
md.data.ndims;
+        ss << "oneDNN NDArray with different dim " << shape.ndim() << "/" << 
md.data.ndims;
         desc_str = ss.str();
         InitDNNLArray(&arr, md, rand, max);
         in_arrs.emplace_back(arr.Slice(slice_amount, arr.shape()[0] - 
slice_amount), desc_str);
@@ -532,10 +532,10 @@ inline std::vector<NDArrayAttrs> 
GetTestOutputArrays(const mxnet::TShape& shp,
 
     // Type 2, 3.
     arr      = NDArray(shape, Context());
-    desc_str = "DNNL NDArray";
+    desc_str = "oneDNN NDArray";
     if (shape.ndim() != md.data.ndims) {
       std::stringstream ss;
-      ss << "DNNL NDArray with different memory layout " << shape.ndim() << 
"/" << md.data.ndims;
+      ss << "oneDNN NDArray with different memory layout " << shape.ndim() << 
"/" << md.data.ndims;
       desc_str = ss.str();
     }
 
@@ -552,10 +552,10 @@ inline std::vector<NDArrayAttrs> 
GetTestOutputArrays(const mxnet::TShape& shp,
     NDArray arr = NDArray(s, Context());
     arr         = arr.AsArray(shape, arr.dtype());
     InitDNNLArray(&arr, md, rand, max);
-    desc_str = "Reused DNNL NDArray";
+    desc_str = "Reused oneDNN NDArray";
     if (shape.ndim() != md.data.ndims) {
       std::stringstream ss;
-      ss << "Reused DNNL NDArray with different memory layout " << 
shape.ndim() << "/"
+      ss << "Reused oneDNN NDArray with different memory layout " << 
shape.ndim() << "/"
          << md.data.ndims;
       desc_str = ss.str();
     }
diff --git a/tests/cpp/operator/dnnl_test.cc b/tests/cpp/operator/dnnl_test.cc
index 84b1a5a..99ed3c0 100644
--- a/tests/cpp/operator/dnnl_test.cc
+++ b/tests/cpp/operator/dnnl_test.cc
@@ -164,7 +164,7 @@ TEST(DNNL_NDArray, GetDataReorder) {
         printf("Init array (");
         for (size_t i = 0; i < s.ndim(); i++)
           printf("%ld, ", s[i]);
-        printf(") with DNNL memory (");
+        printf(") with oneDNN memory (");
         for (int i = 0; i < md.data.ndims; i++)
           printf("%ld, ", md.data.dims[i]);
         printf("), format: %d\n", static_cast<int>(GetDefaultFormat(md)));
diff --git a/tests/nightly/test_np_large_array.py 
b/tests/nightly/test_np_large_array.py
index ba9369a..d415c8a 100644
--- a/tests/nightly/test_np_large_array.py
+++ b/tests/nightly/test_np_large_array.py
@@ -2066,7 +2066,7 @@ def test_rnn_dim_check():
 
 
 @use_np
[email protected](reason='runs without DNNL, wtih is not default behavior')
[email protected](reason='runs without oneDNN, which is not default behavior')
 def test_rnn_vanilla():
     L_SEQ, BAT, L_INP, L_STA = 2**20, 4, 2**10, 2
     def batch_check(x, modes, params):
diff --git a/tests/python/dnnl/subgraphs/test_conv_subgraph.py 
b/tests/python/dnnl/subgraphs/test_conv_subgraph.py
index 0b0840c..6b6169b 100644
--- a/tests/python/dnnl/subgraphs/test_conv_subgraph.py
+++ b/tests/python/dnnl/subgraphs/test_conv_subgraph.py
@@ -446,10 +446,10 @@ def test_deduplication(data_shape, reverse_sum_order, 
model_name):
   model_dedup.initialize()
   model_no_dedup = copy.copy(model_dedup)
 
-  model_dedup.optimize_for(data_nd, backend='DNNL', dedup_subgraph = True, 
skip_infer = True)
+  model_dedup.optimize_for(data_nd, backend='ONEDNN', dedup_subgraph = True, 
skip_infer = True)
   out = model_dedup(data_nd)
 
-  model_dedup.optimize_for(data_nd, backend='DNNL', dedup_subgraph = False, 
skip_infer = True)
+  model_dedup.optimize_for(data_nd, backend='ONEDNN', dedup_subgraph = False, 
skip_infer = True)
   out_dedup = model_no_dedup(data_nd)
 
   assert_almost_equal(out.asnumpy(), out_dedup.asnumpy(), rtol=1e-3, atol=1e-1)
@@ -776,7 +776,7 @@ def test_bn_relu_fusion(axis):
 
     out1 = net(dummy_data)
     out1.wait_to_read()
-    net.optimize_for(dummy_data, backend='DNNL')
+    net.optimize_for(dummy_data, backend='ONEDNN')
     out2 = net(dummy_data)
 
     assert_almost_equal(out1, out2)
diff --git a/tests/python/gpu/test_gluon_model_zoo_gpu.py 
b/tests/python/gpu/test_gluon_model_zoo_gpu.py
index 18d42df..4e4d3c6 100644
--- a/tests/python/gpu/test_gluon_model_zoo_gpu.py
+++ b/tests/python/gpu/test_gluon_model_zoo_gpu.py
@@ -97,14 +97,14 @@ def get_nn_model(name):
     else:
         return get_model(name)
 
-# Seed 1521019752 produced a failure on the Py2 DNNL-GPU CI runner
+# Seed 1521019752 produced a failure on the Py2 oneDNN-GPU CI runner
 # on 2/16/2018 that was not reproducible.  Problem could be timing related or
 # based on non-deterministic algo selection.
 @mx.util.use_np
 @pytest.mark.serial
 def test_training():
     # We use network models without dropout for testing.
-    # TODO(zhengda) mobilenet can't pass this test even without DNNL.
+    # TODO(zhengda) mobilenet can't pass this test even without oneDNN.
     all_models = ['resnet18_v1', 'densenet121']
 
     batch_size = 10
diff --git a/tests/python/quantization/test_quantization.py 
b/tests/python/quantization/test_quantization.py
index 2360347..8f03c84 100644
--- a/tests/python/quantization/test_quantization.py
+++ b/tests/python/quantization/test_quantization.py
@@ -218,7 +218,7 @@ def test_quantized_conv():
             return
         elif is_test_for_dnnl():
             # (TODO)Xinyu: 
https://github.com/apache/incubator-mxnet/issues/16830
-            print('skipped testing quantized_conv for dnnl cpu since it is a 
flaky case')
+            print('skipped testing quantized_conv for oneDNN cpu since it is a 
flaky case')
             return
         elif qdtype == 'uint8' and is_test_for_gpu():
             print('skipped testing quantized_conv for gpu uint8 since it is 
not supported yet')
@@ -823,7 +823,7 @@ def test_quantized_act():
             print('skipped testing quantized_act for native cpu since it is 
not supported yet')
             return
         elif qdtype == 'int8' and is_test_for_dnnl():
-            print('skipped testing quantized_act for dnnl cpu int8 since it is 
not supported yet')
+            print('skipped testing quantized_act for oneDNN cpu int8 since it 
is not supported yet')
             return
         elif is_test_for_gpu():
             print('skipped testing quantized_act for gpu since it is not 
supported yet')
@@ -1058,7 +1058,7 @@ def test_quantize_model():
             print('skipped testing quantize_model for native cpu since it is 
not supported yet')
             return True
         elif qdtype == 'int8' and is_test_for_dnnl():
-            print('skipped testing quantize_model for dnnl cpu int8 since it 
is not supported yet')
+            print('skipped testing quantize_model for oneDNN cpu int8 since it 
is not supported yet')
             return True
         elif qdtype == 'uint8' and is_test_for_gpu():
             print('skipped testing quantize_model for gpu uint8 since it is 
not supported yet')
@@ -1070,7 +1070,7 @@ def test_quantize_model():
             print('skipped testing quantize_model for native cpu since it is 
not supported yet')
             return
         elif qdtype == 'int8' and is_test_for_dnnl():
-            print('skipped testing quantize_model for dnnl cpu int8 since it 
is not supported yet')
+            print('skipped testing quantize_model for oneDNN cpu int8 since it 
is not supported yet')
             return
         elif qdtype == 'uint8' and is_test_for_gpu():
             print('skipped testing quantize_model for gpu uint8 since it is 
not supported yet')
diff --git a/tests/python/unittest/test_numpy_gluon.py 
b/tests/python/unittest/test_numpy_gluon.py
index 1241ead..0be4cad 100644
--- a/tests/python/unittest/test_numpy_gluon.py
+++ b/tests/python/unittest/test_numpy_gluon.py
@@ -434,7 +434,7 @@ def test_optimize_for():
 
     out = net(a)
     b = net.collect_params().pop('d.weight').data()
-    net.optimize_for(a, b, backend="DNNL")
+    net.optimize_for(a, b, backend="ONEDNN")
     out2 = net(a)
 
 
diff --git a/tools/dependencies/README.md b/tools/dependencies/README.md
index acc5d92..9ad6d78 100644
--- a/tools/dependencies/README.md
+++ b/tools/dependencies/README.md
@@ -52,12 +52,12 @@ MXNet is built on top of many dependencies. Managing these 
dependencies could be
 
 ## Overview
 
-The dependencies could be categorized by several groups: BLAS libraries, 
CPU-based performance boost library, i.e. ONEDNN and GPU-based performance 
boosting library including CUDA, cuDNN, NCCL. and others including OpenCV, 
Numpy, S3-related, PS-lite dependencies. The list below shows all the 
dependencies and their version. Except for CUDA, cuDNN, NCCL which the user is 
required to install on their environments, we statically link those 
dependencies into libmxnet.so when we build PyPi pac [...]
+The dependencies could be categorized by several groups: BLAS libraries, 
CPU-based performance boost library, i.e. oneDNN and GPU-based performance 
boosting library including CUDA, cuDNN, NCCL. and others including OpenCV, 
Numpy, S3-related, PS-lite dependencies. The list below shows all the 
dependencies and their version. Except for CUDA, cuDNN, NCCL which the user is 
required to install on their environments, we statically link those 
dependencies into libmxnet.so when we build PyPi pac [...]
 
 | Dependencies  | MXNet Version |
 | :------------: |:-------------:| 
 |OpenBLAS| 0.3.9 |
-|ONEDNN| 2.0 | 
+|oneDNN| 2.3.2 | 
 |CUDA| 10.1 |
 |cuDNN| 7.5.1 |
 |NCCL| 2.4.2 |
@@ -105,7 +105,7 @@ sudo apt-get install -y git \
     pkg-config
 ```
 
-### MKL, ONEDNN
+### MKL, oneDNN
 
 @pengzhao-intel 
(https://github.com/apache/incubator-mxnet/commits?author=pengzhao-intel) and 
his team are tracking and updating these versions. Kudos to them!
 
diff --git a/tools/pip/doc/CPU_ADDITIONAL.md b/tools/pip/doc/CPU_ADDITIONAL.md
index 6cb82b8..7aa6a95 100644
--- a/tools/pip/doc/CPU_ADDITIONAL.md
+++ b/tools/pip/doc/CPU_ADDITIONAL.md
@@ -26,7 +26,7 @@ This package supports Linux, Mac OSX, and Windows platforms. 
You may also want t
 - [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2 
support.
 - [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1 
support.
 - [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without oneDNN.
 
 To use this package on Linux you need the `libquadmath.so.0` shared library. On
 Debian based systems, including Ubuntu, run `sudo apt install libquadmath0` to
diff --git a/tools/pip/doc/CU101_ADDITIONAL.md 
b/tools/pip/doc/CU101_ADDITIONAL.md
index bcf0be7..3d92b11 100644
--- a/tools/pip/doc/CU101_ADDITIONAL.md
+++ b/tools/pip/doc/CU101_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may 
also want to check:
 - [mxnet-cu110](https://pypi.python.org/pypi/mxnet-cu110/) with CUDA-11.0 
support.
 - [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2 
support.
 - [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without oneDNN.
 
 To download CUDA, check [CUDA 
download](https://developer.nvidia.com/cuda-downloads). For more instructions, 
check [CUDA Toolkit online 
documentation](http://docs.nvidia.com/cuda/index.html).
 
diff --git a/tools/pip/doc/CU102_ADDITIONAL.md 
b/tools/pip/doc/CU102_ADDITIONAL.md
index a227957..1f580bf 100644
--- a/tools/pip/doc/CU102_ADDITIONAL.md
+++ b/tools/pip/doc/CU102_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may 
also want to check:
 - [mxnet-cu110](https://pypi.python.org/pypi/mxnet-cu110/) with CUDA-11.0 
support.
 - [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1 
support.
 - [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without oneDNN.
 
 To download CUDA, check [CUDA 
download](https://developer.nvidia.com/cuda-downloads). For more instructions, 
check [CUDA Toolkit online 
documentation](http://docs.nvidia.com/cuda/index.html).
 
diff --git a/tools/pip/doc/CU110_ADDITIONAL.md 
b/tools/pip/doc/CU110_ADDITIONAL.md
index f78a945..8774b76 100644
--- a/tools/pip/doc/CU110_ADDITIONAL.md
+++ b/tools/pip/doc/CU110_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may 
also want to check:
 - [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2 
support.
 - [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1 
support.
 - [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without oneDNN.
 
 To download CUDA, check [CUDA 
download](https://developer.nvidia.com/cuda-downloads). For more instructions, 
check [CUDA Toolkit online 
documentation](http://docs.nvidia.com/cuda/index.html).
 
diff --git a/tools/pip/doc/CU112_ADDITIONAL.md 
b/tools/pip/doc/CU112_ADDITIONAL.md
index 37686ab..340ca13 100644
--- a/tools/pip/doc/CU112_ADDITIONAL.md
+++ b/tools/pip/doc/CU112_ADDITIONAL.md
@@ -25,7 +25,7 @@ This package supports Linux and Windows platforms. You may 
also want to check:
 - [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2 
support.
 - [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1 
support.
 - [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without oneDNN.
 
 To download CUDA, check [CUDA 
download](https://developer.nvidia.com/cuda-downloads). For more instructions, 
check [CUDA Toolkit online 
documentation](http://docs.nvidia.com/cuda/index.html).
 
diff --git a/tools/pip/doc/NATIVE_ADDITIONAL.md 
b/tools/pip/doc/NATIVE_ADDITIONAL.md
index 36de931..4a303e8 100644
--- a/tools/pip/doc/NATIVE_ADDITIONAL.md
+++ b/tools/pip/doc/NATIVE_ADDITIONAL.md
@@ -26,7 +26,7 @@ This package supports Linux and Windows platforms. You may 
also want to check:
 - [mxnet-cu102](https://pypi.python.org/pypi/mxnet-cu102/) with CUDA-10.2 
support.
 - [mxnet-cu101](https://pypi.python.org/pypi/mxnet-cu101/) with CUDA-10.1 
support.
 - [mxnet](https://pypi.python.org/pypi/mxnet/).
-- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without ONEDNN.
+- [mxnet-native](https://pypi.python.org/pypi/mxnet-native/) CPU variant 
without oneDNN.
 
 To download CUDA, check [CUDA 
download](https://developer.nvidia.com/cuda-downloads). For more instructions, 
check [CUDA Toolkit online 
documentation](http://docs.nvidia.com/cuda/index.html).
 
diff --git a/tools/staticbuild/README.md b/tools/staticbuild/README.md
index 087fbf4..c7fb62b 100644
--- a/tools/staticbuild/README.md
+++ b/tools/staticbuild/README.md
@@ -33,13 +33,13 @@ Ubuntu systems.
 ```
 tools/staticbuild/build.sh cu112
 ```
-This would build the mxnet package based on CUDA 11.2. Currently, we support 
variants cpu, native, cu101, cu102, cu110, and cu112. All of these variants 
expect native have ONEDNN backend enabled. 
+This would build the mxnet package based on CUDA 11.2. Currently, we support 
variants cpu, native, cu101, cu102, cu110, and cu112. All of these variants 
expect native have oneDNN backend enabled. 
 
 ```
 tools/staticbuild/build.sh cpu
 ```
 
-This would build the mxnet package based on ONEDNN.
+This would build the mxnet package based on oneDNN.
 
 As the result, users would have a complete static dependencies in 
`/staticdeps` in the root folder as well as a static-linked `libmxnet.so` file 
lives in `lib`. You can build your language binding by using the `libmxnet.so`.

[incubator-mxnet] branch master updated: Unify all names used to refer to oneDNN library in logs and docs to oneDNN (#20719)

Reply via email to