szha commented on a change in pull request #20473:
URL: https://github.com/apache/incubator-mxnet/pull/20473#discussion_r694319553



##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 

Review comment:
       ```suggestion
   Since the introduction of the Gluon API in MXNet 1.x, it has superseded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in the deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.

Review comment:
       ```suggestion
   - **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create a graph.
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 
+**Migration Guide**: 
+
+1. Currently, NumPy ndarray only supports `default` storage type, other 
storage types, like `row_sparse`, `csr` are not supported. Also, `tostype()` 
attribute is deprecated. 
+
+2. Users can use `as_np_ndarray` attribute to switch from a legacy NDArray to 
NumPy ndarray just like this:
+    ```{.python}
+    import mxnet as mx
+    nd_array = mx.ones((5,3))
+    np_array = nd_array.as_np_ndarray()
+    ```
+
+3. Compared with legacy NDArray, some attributes are deprecated in NumPy 
ndarray. Listed below are some of the deprecated APIs and their corresponding 
replacements in NumPy ndarray, others can be found in [**Appendix/NumPy Array 
Deprecated Attributes**](#NumPy-Array-Deprecated-Attributes).
+    |                   Deprecated Attributes               |    NumPy ndarray 
Equivalent    |
+    | ----------------------------------------------------- | 
------------------------------ |
+    |                   `a.asscalar()`                      |         
`a.item()`         |
+    |                 `a.as_in_context()`                   |      
`a.as_in_ctx()`       |
+    |                    `a.context`                        |          `a.ctx` 
          |
+    |                   `a.reshape_like(b)`                 |    
`a.reshape(b.shape)`    |
+    |                    `a.zeros_like(b)`                  |   
`mx.np.zeros_like(b)`  |
+    |                    `a.ones_like(b)`                   |   
`mx.np.ones_like(b)`   |
+
+4. Compared with legacy NDArray, some attributes will have different behaviors 
and take different inputs. 
+    |          Attribute            | Legacy Inputs | NumPy Inputs |
+    | ----------------------------- | ------------------------ | -------- |
+    | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the shape 
can take special values from the set {0, -1, -2, -3, -4}. <br> The significance 
of each is explained below: <br>  ``0``  copy this dimension from the input to 
the output shape. <br>  ``-1`` infers the dimension of the output shape by 
using the remainder of the input dimensions. <br> ``-2`` copy all/remainder of 
the input dimensions to the output shape. <br> ``-3`` use the product of two 
consecutive dimensions of the input shape as the output dimension. <br> ``-4`` 
split one dimension of the input into two dimensions passed subsequent to -4 in 
shape (can contain -1). <br> **reverse**: If set to 1, then the special values 
are inferred from right to left | **shape**: shape parameter will be 
**positional argument** rather than key-word argument. <br> Some dimensions of 
the shape can take special values from the set {-1, -2, -3, -4, -5, -6}. <br> 
The significance of each is explained below: <br>  ``-1`` infers 
 the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` will skip current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all remain of the input dimensions to the output shape. 
<br> ``-5`` use the product of two consecutive dimensions of the input shape as 
the output. <br> ``-6`` split one dimension of the input into two dimensions 
passed subsequent to -6 in the new shape. <br> **reverse**: No **reverse** 
parameter for `np.reshape` but for `npx.reshape`. <br> **order**: Read the 
elements of `a` using this index order, and place the elements into the 
reshaped array using this index order. |
+
+
+#### NumPy and NumPy-extension Operators
+Most of the legacy NDArray operators(`mx.nd.op`) have the equivalent ones in 
np/npx namespace, users can just repalce them with `mx.np.op` or `mx.npx.op` to 
migrate. Some of the operators will have different inputs and behaviors as 
listed in the table below. 

Review comment:
       ```suggestion
   Most of the legacy NDArray operators(`mx.nd.op`) have the equivalent ones in 
np/npx namespace. Users can just replace them with `mx.np.op` or `mx.npx.op` to 
migrate. Some of the operators will have different inputs and behaviors as 
listed in the table below. 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 
+**Migration Guide**: 
+
+1. Currently, NumPy ndarray only supports `default` storage type, other 
storage types, like `row_sparse`, `csr` are not supported. Also, `tostype()` 
attribute is deprecated. 
+
+2. Users can use `as_np_ndarray` attribute to switch from a legacy NDArray to 
NumPy ndarray just like this:
+    ```{.python}
+    import mxnet as mx
+    nd_array = mx.ones((5,3))
+    np_array = nd_array.as_np_ndarray()
+    ```
+
+3. Compared with legacy NDArray, some attributes are deprecated in NumPy 
ndarray. Listed below are some of the deprecated APIs and their corresponding 
replacements in NumPy ndarray, others can be found in [**Appendix/NumPy Array 
Deprecated Attributes**](#NumPy-Array-Deprecated-Attributes).
+    |                   Deprecated Attributes               |    NumPy ndarray 
Equivalent    |
+    | ----------------------------------------------------- | 
------------------------------ |
+    |                   `a.asscalar()`                      |         
`a.item()`         |
+    |                 `a.as_in_context()`                   |      
`a.as_in_ctx()`       |
+    |                    `a.context`                        |          `a.ctx` 
          |
+    |                   `a.reshape_like(b)`                 |    
`a.reshape(b.shape)`    |
+    |                    `a.zeros_like(b)`                  |   
`mx.np.zeros_like(b)`  |
+    |                    `a.ones_like(b)`                   |   
`mx.np.ones_like(b)`   |
+
+4. Compared with legacy NDArray, some attributes will have different behaviors 
and take different inputs. 
+    |          Attribute            | Legacy Inputs | NumPy Inputs |
+    | ----------------------------- | ------------------------ | -------- |
+    | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the shape 
can take special values from the set {0, -1, -2, -3, -4}. <br> The significance 
of each is explained below: <br>  ``0``  copy this dimension from the input to 
the output shape. <br>  ``-1`` infers the dimension of the output shape by 
using the remainder of the input dimensions. <br> ``-2`` copy all/remainder of 
the input dimensions to the output shape. <br> ``-3`` use the product of two 
consecutive dimensions of the input shape as the output dimension. <br> ``-4`` 
split one dimension of the input into two dimensions passed subsequent to -4 in 
shape (can contain -1). <br> **reverse**: If set to 1, then the special values 
are inferred from right to left | **shape**: shape parameter will be 
**positional argument** rather than key-word argument. <br> Some dimensions of 
the shape can take special values from the set {-1, -2, -3, -4, -5, -6}. <br> 
The significance of each is explained below: <br>  ``-1`` infers 
 the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` will skip current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all remain of the input dimensions to the output shape. 
<br> ``-5`` use the product of two consecutive dimensions of the input shape as 
the output. <br> ``-6`` split one dimension of the input into two dimensions 
passed subsequent to -6 in the new shape. <br> **reverse**: No **reverse** 
parameter for `np.reshape` but for `npx.reshape`. <br> **order**: Read the 
elements of `a` using this index order, and place the elements into the 
reshaped array using this index order. |
+
+
+#### NumPy and NumPy-extension Operators
+Most of the legacy NDArray operators(`mx.nd.op`) have the equivalent ones in 
np/npx namespace, users can just repalce them with `mx.np.op` or `mx.npx.op` to 
migrate. Some of the operators will have different inputs and behaviors as 
listed in the table below. 
+**Migration Guide**:
+
+1. Operators migration with name/inputs changes
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.flatten(*args, **kwargs)`                |            
`mx.npx.batch_flatten(*args, **kwargs)`                    |                
moved to `npx` namespace with new name `batch_flatten`            |
+    |       `mx.nd.concat(a, b, c)`                |            
`mx.np.concatenate([a, b, c])`                    |              - moved to 
`np` namespace with new name `concatenate`. <br> - use list of ndarrays as 
input rather than positional ndarrays           |
+    |        `mx.nd.stack(a, b, c)`                 |            
`mx.np.stack([a, b, c])`                    |              - moved to `np` 
namespace. <br> - use list of ndarrays as input rather than positional ndarrays 
          |
+    |      `mx.nd.SliceChannel(*args, **kwargs)`              |            
`mx.npx.slice_channel(*args, **kwargs)`                 |              - moved 
to `npx` namespace with new name `slice_channel`.          |
+    |      `mx.nd.FullyConnected(*args, **kwargs)`              |            
`mx.npx.fully_connected(*args, **kwargs)`                 |              - 
moved to `npx` namespace with new name `fully_connected`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.elemwise_add(a, b)`              |            `a + b`        
         |              - Just use ndarray python operator.          |
+    |      `mx.nd.elemwise_mul(a, b)`              |            
`mx.np.multiply(a, b)`                 |              - Use `multiply` operator 
in `np` namespace.          |
+
+2. Operators migration with multiple steps: `mx.nd.mean` -> `mx.np.mean`:
+```{.python}
+import mxnet as mx
+# Legacy: calculate mean value with reduction on axis 1
+#         with `exclude` option on 
+nd_mean = mx.nd.mean(data, axis=1, exclude=1)
+
+# Numpy: no exclude option to users, but user can perform steps as follow
+axes = list(range(data.ndim))
+del axes[1]
+np_mean = mx.np.mean(data, axis=axes)
+```
+
+3. Random Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ---------------------------- |
+    |       `mx.random.uniform(-1.0, 1.0, shape=(2, 3))` <br> 
`mx.nd.random.uniform(-1.0, 1.0, shape=(2, 3))`                |            
`mx.np.random.uniform(-1.0, 1.0, size=(2, 3))`                    |             
   For all the NumPy random operators, use **size** key word instead of 
**shape**           |

Review comment:
       ```suggestion
       |       `mx.random.uniform(-1.0, 1.0, shape=(2, 3))` <br> 
`mx.nd.random.uniform(-1.0, 1.0, shape=(2, 3))`                |            
`mx.np.random.uniform(-1.0, 1.0, size=(2, 3))`                    |             
   For all the NumPy random operators, use **size** keyword instead of 
**shape**           |
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 
+**Migration Guide**: 
+
+1. Currently, NumPy ndarray only supports `default` storage type, other 
storage types, like `row_sparse`, `csr` are not supported. Also, `tostype()` 
attribute is deprecated. 
+
+2. Users can use `as_np_ndarray` attribute to switch from a legacy NDArray to 
NumPy ndarray just like this:
+    ```{.python}
+    import mxnet as mx
+    nd_array = mx.ones((5,3))
+    np_array = nd_array.as_np_ndarray()
+    ```
+
+3. Compared with legacy NDArray, some attributes are deprecated in NumPy 
ndarray. Listed below are some of the deprecated APIs and their corresponding 
replacements in NumPy ndarray, others can be found in [**Appendix/NumPy Array 
Deprecated Attributes**](#NumPy-Array-Deprecated-Attributes).
+    |                   Deprecated Attributes               |    NumPy ndarray 
Equivalent    |
+    | ----------------------------------------------------- | 
------------------------------ |
+    |                   `a.asscalar()`                      |         
`a.item()`         |
+    |                 `a.as_in_context()`                   |      
`a.as_in_ctx()`       |
+    |                    `a.context`                        |          `a.ctx` 
          |
+    |                   `a.reshape_like(b)`                 |    
`a.reshape(b.shape)`    |
+    |                    `a.zeros_like(b)`                  |   
`mx.np.zeros_like(b)`  |
+    |                    `a.ones_like(b)`                   |   
`mx.np.ones_like(b)`   |
+
+4. Compared with legacy NDArray, some attributes will have different behaviors 
and take different inputs. 
+    |          Attribute            | Legacy Inputs | NumPy Inputs |
+    | ----------------------------- | ------------------------ | -------- |
+    | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the shape 
can take special values from the set {0, -1, -2, -3, -4}. <br> The significance 
of each is explained below: <br>  ``0``  copy this dimension from the input to 
the output shape. <br>  ``-1`` infers the dimension of the output shape by 
using the remainder of the input dimensions. <br> ``-2`` copy all/remainder of 
the input dimensions to the output shape. <br> ``-3`` use the product of two 
consecutive dimensions of the input shape as the output dimension. <br> ``-4`` 
split one dimension of the input into two dimensions passed subsequent to -4 in 
shape (can contain -1). <br> **reverse**: If set to 1, then the special values 
are inferred from right to left | **shape**: shape parameter will be 
**positional argument** rather than key-word argument. <br> Some dimensions of 
the shape can take special values from the set {-1, -2, -3, -4, -5, -6}. <br> 
The significance of each is explained below: <br>  ``-1`` infers 
 the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` will skip current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all remain of the input dimensions to the output shape. 
<br> ``-5`` use the product of two consecutive dimensions of the input shape as 
the output. <br> ``-6`` split one dimension of the input into two dimensions 
passed subsequent to -6 in the new shape. <br> **reverse**: No **reverse** 
parameter for `np.reshape` but for `npx.reshape`. <br> **order**: Read the 
elements of `a` using this index order, and place the elements into the 
reshaped array using this index order. |
+
+
+#### NumPy and NumPy-extension Operators
+Most of the legacy NDArray operators(`mx.nd.op`) have the equivalent ones in 
np/npx namespace, users can just repalce them with `mx.np.op` or `mx.npx.op` to 
migrate. Some of the operators will have different inputs and behaviors as 
listed in the table below. 
+**Migration Guide**:
+
+1. Operators migration with name/inputs changes
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.flatten(*args, **kwargs)`                |            
`mx.npx.batch_flatten(*args, **kwargs)`                    |                
moved to `npx` namespace with new name `batch_flatten`            |
+    |       `mx.nd.concat(a, b, c)`                |            
`mx.np.concatenate([a, b, c])`                    |              - moved to 
`np` namespace with new name `concatenate`. <br> - use list of ndarrays as 
input rather than positional ndarrays           |
+    |        `mx.nd.stack(a, b, c)`                 |            
`mx.np.stack([a, b, c])`                    |              - moved to `np` 
namespace. <br> - use list of ndarrays as input rather than positional ndarrays 
          |
+    |      `mx.nd.SliceChannel(*args, **kwargs)`              |            
`mx.npx.slice_channel(*args, **kwargs)`                 |              - moved 
to `npx` namespace with new name `slice_channel`.          |
+    |      `mx.nd.FullyConnected(*args, **kwargs)`              |            
`mx.npx.fully_connected(*args, **kwargs)`                 |              - 
moved to `npx` namespace with new name `fully_connected`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.elemwise_add(a, b)`              |            `a + b`        
         |              - Just use ndarray python operator.          |
+    |      `mx.nd.elemwise_mul(a, b)`              |            
`mx.np.multiply(a, b)`                 |              - Use `multiply` operator 
in `np` namespace.          |
+
+2. Operators migration with multiple steps: `mx.nd.mean` -> `mx.np.mean`:
+```{.python}
+import mxnet as mx
+# Legacy: calculate mean value with reduction on axis 1
+#         with `exclude` option on 
+nd_mean = mx.nd.mean(data, axis=1, exclude=1)
+
+# Numpy: no exclude option to users, but user can perform steps as follow
+axes = list(range(data.ndim))
+del axes[1]
+np_mean = mx.np.mean(data, axis=axes)
+```
+
+3. Random Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ---------------------------- |
+    |       `mx.random.uniform(-1.0, 1.0, shape=(2, 3))` <br> 
`mx.nd.random.uniform(-1.0, 1.0, shape=(2, 3))`                |            
`mx.np.random.uniform(-1.0, 1.0, size=(2, 3))`                    |             
   For all the NumPy random operators, use **size** key word instead of 
**shape**           |
+    |       `mx.nd.random.multinomial(*args, **kwargs)`              |         
   `mx.npx.random.categorical(*args, **kwargs)`                    |            
    [use `npx.random.categorical` to have the behavior of drawing 1 sample from 
multiple 
distributions.](https://github.com/apache/incubator-mxnet/issues/20373#issuecomment-869120214)
           |
+
+4. Control Flow Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.contrib.foreach(body, data, init_states, name)`             
   |            `mx.npx.foreach(body, data, init_states, name)`                 
   |                - moved to `npx` namespace. <br> - Will not support global 
variables as body's inputs(body's inputs must be either data or states or both) 
          |
+    |       `mx.nd.contrib.while_loop(cond, func, loop_vars, max_iterations, 
name)`                |            `mx.npx.while_loop(cond, func, loop_vars, 
max_iterations, name)`                    |                - moved to `npx` 
namespace. <br> - Will not support global variables as cond or func's 
inputs(cond or func's inputs must be in loop_vars)           |
+    |       `mx.nd.contrib.cond(pred, then_func, else_func, inputs, name)`     
           |            `mx.npx.cond(pred, then_func, else_func, name)`         
           |                - moved to `npx` namespace. <br> - users needs to 
provide the inputs of pred, then_func and else_func as inputs <br> - Will not 
support global variables as pred, then_func or else_func's inputs(pred, 
then_func or else_func's inputs must be in inputs)           |
+
+5. Functionalities
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.save(*args, **kwargs)`                |            
`mx.npx.savez(*args, **kwargs)`                    |                - moved to 
`npx` namespace. <br> - Only accept positional arguments, try to flatten the 
list/dict before feed in          |
+    |       `mx.nd.load(*args, **kwargs)`                |            
`mx.npx.load(*args, **kwargs)`                    |                - moved to 
`npx` namespace.         |
+    |       `mx.nd.waitall()`                |            `mx.npx.waitall()`   
                 |                - moved to `npx` namespace.         |
+
+Other operator changes are included in [**Appendix/NumPy and NumPy-extension 
Operators**](#NumPy-and-NumPy-extension-Operators1) 
+
+
+
+### Layers and Blocks
+With deferred compute and tracing being introduced in Gluon2.0, users do not 
need to interact with symbols any more. There are a lot of changes in building 
a model with Gluon API, including parameter management and naming, forward pass 
computing and parameter shape inferencing. We will provide a step-by-step 
migration guidance on how to build a model with new APIs.
+
+#### Parameter Management and Block Naming
+In Gluon, each Parameter or Block has a name (and prefix). Parameter names are 
specified by users and Block names can be either specified by users or 
automatically created. In Gluon 1.x, parameters are accessed via the `params` 
variable of the `ParameterDict` in `Block`. Users will need to manually use 
`with self.name_scope():` for children blocks and specify prefix for the top 
level block. Otherwise, it will lead to wrong name scopes and can return 
parameters of children blocks that are not in current name scope. An example 
for initializing the Block and Parameter in Gluon 1.x: 

Review comment:
       ```suggestion
   In Gluon, each Parameter or Block has a name (and prefix). Parameter names 
are specified by users and Block names can be either specified by users or 
automatically created. In Gluon 1.x, parameters are accessed via the `params` 
variable of the `ParameterDict` in `Block`. Users will need to manually use 
`with self.name_scope():` for children blocks and specify prefix for the top 
level block. Otherwise, it will lead to wrong name scopes and can return 
parameters of children blocks that are not in the current name scope. An 
example for initializing the Block and Parameter in Gluon 1.x: 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).

Review comment:
       ```suggestion
   - The dataset is not fully [supported by the 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 

Review comment:
       ```suggestion
   [Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes a default 
value of None; when set to `True` the dataloader will compile the python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 

Review comment:
       ```suggestion
   In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and the deferred compute mechanism. 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 

Review comment:
       ```suggestion
   To help users migrate smoothly to use these simplified interfaces, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 

Review comment:
       ```suggestion
   - **Imperative-only coding experience**: with the deferred compute and 
tracing being introduced, users only need to specify the computation through 
imperative coding but can still make hybridization work. Users will no longer 
need to interact with symbol APIs. 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 

Review comment:
       ```suggestion
   MXNet [NumPy ndarray(i.e. 
`mx.np.ndarray`)](../../api/np/arrays.ndarray.html) is a multidimensional 
container of items of the same type and size. Most of its properties and 
attributes are the same as legacy NDArrays(i.e. `mx.nd.ndarray`), so users can 
use the NumPy array library just as they did with legacy NDArrays. But, there 
are still some changes and deprecations that need attention, as mentioned 
below. 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 
+**Migration Guide**: 
+
+1. Currently, NumPy ndarray only supports `default` storage type, other 
storage types, like `row_sparse`, `csr` are not supported. Also, `tostype()` 
attribute is deprecated. 
+
+2. Users can use `as_np_ndarray` attribute to switch from a legacy NDArray to 
NumPy ndarray just like this:
+    ```{.python}
+    import mxnet as mx
+    nd_array = mx.ones((5,3))
+    np_array = nd_array.as_np_ndarray()
+    ```
+
+3. Compared with legacy NDArray, some attributes are deprecated in NumPy 
ndarray. Listed below are some of the deprecated APIs and their corresponding 
replacements in NumPy ndarray, others can be found in [**Appendix/NumPy Array 
Deprecated Attributes**](#NumPy-Array-Deprecated-Attributes).
+    |                   Deprecated Attributes               |    NumPy ndarray 
Equivalent    |
+    | ----------------------------------------------------- | 
------------------------------ |
+    |                   `a.asscalar()`                      |         
`a.item()`         |
+    |                 `a.as_in_context()`                   |      
`a.as_in_ctx()`       |
+    |                    `a.context`                        |          `a.ctx` 
          |
+    |                   `a.reshape_like(b)`                 |    
`a.reshape(b.shape)`    |
+    |                    `a.zeros_like(b)`                  |   
`mx.np.zeros_like(b)`  |
+    |                    `a.ones_like(b)`                   |   
`mx.np.ones_like(b)`   |
+
+4. Compared with legacy NDArray, some attributes will have different behaviors 
and take different inputs. 
+    |          Attribute            | Legacy Inputs | NumPy Inputs |
+    | ----------------------------- | ------------------------ | -------- |
+    | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the shape 
can take special values from the set {0, -1, -2, -3, -4}. <br> The significance 
of each is explained below: <br>  ``0``  copy this dimension from the input to 
the output shape. <br>  ``-1`` infers the dimension of the output shape by 
using the remainder of the input dimensions. <br> ``-2`` copy all/remainder of 
the input dimensions to the output shape. <br> ``-3`` use the product of two 
consecutive dimensions of the input shape as the output dimension. <br> ``-4`` 
split one dimension of the input into two dimensions passed subsequent to -4 in 
shape (can contain -1). <br> **reverse**: If set to 1, then the special values 
are inferred from right to left | **shape**: shape parameter will be 
**positional argument** rather than key-word argument. <br> Some dimensions of 
the shape can take special values from the set {-1, -2, -3, -4, -5, -6}. <br> 
The significance of each is explained below: <br>  ``-1`` infers 
 the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` will skip current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all remain of the input dimensions to the output shape. 
<br> ``-5`` use the product of two consecutive dimensions of the input shape as 
the output. <br> ``-6`` split one dimension of the input into two dimensions 
passed subsequent to -6 in the new shape. <br> **reverse**: No **reverse** 
parameter for `np.reshape` but for `npx.reshape`. <br> **order**: Read the 
elements of `a` using this index order, and place the elements into the 
reshaped array using this index order. |

Review comment:
       ```suggestion
       | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the 
shape can take special values from the set {0, -1, -2, -3, -4}. <br> The 
significance of each is explained below: <br>  ``0``  copy this dimension from 
the input to the output shape. <br>  ``-1`` infers the dimension of the output 
shape by using the remainder of the input dimensions. <br> ``-2`` copy 
all/remainder of the input dimensions to the output shape. <br> ``-3`` use the 
product of two consecutive dimensions of the input shape as the output 
dimension. <br> ``-4`` split one dimension of the input into two dimensions 
passed subsequent to -4 in shape (can contain -1). <br> **reverse**: If set to 
1, then the special values are inferred from right to left | **shape**: shape 
parameter will be **positional argument** rather than key-word argument. <br> 
Some dimensions of the shape can take special values from the set {-1, -2, -3, 
-4, -5, -6}. <br> The significance of each is explained below: <br>  ``-1`` 
infer
 s the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` skip the current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all the remaining the input dimensions to the output 
shape. <br> ``-5`` use the product of two consecutive dimensions of the input 
shape as the output. <br> ``-6`` split one dimension of the input into two 
dimensions passed subsequent to -6 in the new shape. <br> **reverse**: No 
**reverse** parameter for `np.reshape` but for `npx.reshape`. <br> **order**: 
Read the elements of `a` using this index order, and place the elements into 
the reshaped array using this index order. |
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 

Review comment:
       ```suggestion
   - Bachify is not fully [supported by the 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 
+**Migration Guide**: 
+
+1. Currently, NumPy ndarray only supports `default` storage type, other 
storage types, like `row_sparse`, `csr` are not supported. Also, `tostype()` 
attribute is deprecated. 
+
+2. Users can use `as_np_ndarray` attribute to switch from a legacy NDArray to 
NumPy ndarray just like this:
+    ```{.python}
+    import mxnet as mx
+    nd_array = mx.ones((5,3))
+    np_array = nd_array.as_np_ndarray()
+    ```
+
+3. Compared with legacy NDArray, some attributes are deprecated in NumPy 
ndarray. Listed below are some of the deprecated APIs and their corresponding 
replacements in NumPy ndarray, others can be found in [**Appendix/NumPy Array 
Deprecated Attributes**](#NumPy-Array-Deprecated-Attributes).
+    |                   Deprecated Attributes               |    NumPy ndarray 
Equivalent    |
+    | ----------------------------------------------------- | 
------------------------------ |
+    |                   `a.asscalar()`                      |         
`a.item()`         |
+    |                 `a.as_in_context()`                   |      
`a.as_in_ctx()`       |
+    |                    `a.context`                        |          `a.ctx` 
          |
+    |                   `a.reshape_like(b)`                 |    
`a.reshape(b.shape)`    |
+    |                    `a.zeros_like(b)`                  |   
`mx.np.zeros_like(b)`  |
+    |                    `a.ones_like(b)`                   |   
`mx.np.ones_like(b)`   |
+
+4. Compared with legacy NDArray, some attributes will have different behaviors 
and take different inputs. 
+    |          Attribute            | Legacy Inputs | NumPy Inputs |
+    | ----------------------------- | ------------------------ | -------- |
+    | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the shape 
can take special values from the set {0, -1, -2, -3, -4}. <br> The significance 
of each is explained below: <br>  ``0``  copy this dimension from the input to 
the output shape. <br>  ``-1`` infers the dimension of the output shape by 
using the remainder of the input dimensions. <br> ``-2`` copy all/remainder of 
the input dimensions to the output shape. <br> ``-3`` use the product of two 
consecutive dimensions of the input shape as the output dimension. <br> ``-4`` 
split one dimension of the input into two dimensions passed subsequent to -4 in 
shape (can contain -1). <br> **reverse**: If set to 1, then the special values 
are inferred from right to left | **shape**: shape parameter will be 
**positional argument** rather than key-word argument. <br> Some dimensions of 
the shape can take special values from the set {-1, -2, -3, -4, -5, -6}. <br> 
The significance of each is explained below: <br>  ``-1`` infers 
 the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` will skip current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all remain of the input dimensions to the output shape. 
<br> ``-5`` use the product of two consecutive dimensions of the input shape as 
the output. <br> ``-6`` split one dimension of the input into two dimensions 
passed subsequent to -6 in the new shape. <br> **reverse**: No **reverse** 
parameter for `np.reshape` but for `npx.reshape`. <br> **order**: Read the 
elements of `a` using this index order, and place the elements into the 
reshaped array using this index order. |
+
+
+#### NumPy and NumPy-extension Operators
+Most of the legacy NDArray operators(`mx.nd.op`) have the equivalent ones in 
np/npx namespace, users can just repalce them with `mx.np.op` or `mx.npx.op` to 
migrate. Some of the operators will have different inputs and behaviors as 
listed in the table below. 
+**Migration Guide**:
+
+1. Operators migration with name/inputs changes
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.flatten(*args, **kwargs)`                |            
`mx.npx.batch_flatten(*args, **kwargs)`                    |                
moved to `npx` namespace with new name `batch_flatten`            |
+    |       `mx.nd.concat(a, b, c)`                |            
`mx.np.concatenate([a, b, c])`                    |              - moved to 
`np` namespace with new name `concatenate`. <br> - use list of ndarrays as 
input rather than positional ndarrays           |
+    |        `mx.nd.stack(a, b, c)`                 |            
`mx.np.stack([a, b, c])`                    |              - moved to `np` 
namespace. <br> - use list of ndarrays as input rather than positional ndarrays 
          |
+    |      `mx.nd.SliceChannel(*args, **kwargs)`              |            
`mx.npx.slice_channel(*args, **kwargs)`                 |              - moved 
to `npx` namespace with new name `slice_channel`.          |
+    |      `mx.nd.FullyConnected(*args, **kwargs)`              |            
`mx.npx.fully_connected(*args, **kwargs)`                 |              - 
moved to `npx` namespace with new name `fully_connected`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.elemwise_add(a, b)`              |            `a + b`        
         |              - Just use ndarray python operator.          |
+    |      `mx.nd.elemwise_mul(a, b)`              |            
`mx.np.multiply(a, b)`                 |              - Use `multiply` operator 
in `np` namespace.          |
+
+2. Operators migration with multiple steps: `mx.nd.mean` -> `mx.np.mean`:
+```{.python}
+import mxnet as mx
+# Legacy: calculate mean value with reduction on axis 1
+#         with `exclude` option on 
+nd_mean = mx.nd.mean(data, axis=1, exclude=1)
+
+# Numpy: no exclude option to users, but user can perform steps as follow
+axes = list(range(data.ndim))
+del axes[1]
+np_mean = mx.np.mean(data, axis=axes)
+```
+
+3. Random Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ---------------------------- |
+    |       `mx.random.uniform(-1.0, 1.0, shape=(2, 3))` <br> 
`mx.nd.random.uniform(-1.0, 1.0, shape=(2, 3))`                |            
`mx.np.random.uniform(-1.0, 1.0, size=(2, 3))`                    |             
   For all the NumPy random operators, use **size** key word instead of 
**shape**           |
+    |       `mx.nd.random.multinomial(*args, **kwargs)`              |         
   `mx.npx.random.categorical(*args, **kwargs)`                    |            
    [use `npx.random.categorical` to have the behavior of drawing 1 sample from 
multiple 
distributions.](https://github.com/apache/incubator-mxnet/issues/20373#issuecomment-869120214)
           |
+
+4. Control Flow Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.contrib.foreach(body, data, init_states, name)`             
   |            `mx.npx.foreach(body, data, init_states, name)`                 
   |                - moved to `npx` namespace. <br> - Will not support global 
variables as body's inputs(body's inputs must be either data or states or both) 
          |
+    |       `mx.nd.contrib.while_loop(cond, func, loop_vars, max_iterations, 
name)`                |            `mx.npx.while_loop(cond, func, loop_vars, 
max_iterations, name)`                    |                - moved to `npx` 
namespace. <br> - Will not support global variables as cond or func's 
inputs(cond or func's inputs must be in loop_vars)           |
+    |       `mx.nd.contrib.cond(pred, then_func, else_func, inputs, name)`     
           |            `mx.npx.cond(pred, then_func, else_func, name)`         
           |                - moved to `npx` namespace. <br> - users needs to 
provide the inputs of pred, then_func and else_func as inputs <br> - Will not 
support global variables as pred, then_func or else_func's inputs(pred, 
then_func or else_func's inputs must be in inputs)           |
+
+5. Functionalities
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.save(*args, **kwargs)`                |            
`mx.npx.savez(*args, **kwargs)`                    |                - moved to 
`npx` namespace. <br> - Only accept positional arguments, try to flatten the 
list/dict before feed in          |
+    |       `mx.nd.load(*args, **kwargs)`                |            
`mx.npx.load(*args, **kwargs)`                    |                - moved to 
`npx` namespace.         |
+    |       `mx.nd.waitall()`                |            `mx.npx.waitall()`   
                 |                - moved to `npx` namespace.         |
+
+Other operator changes are included in [**Appendix/NumPy and NumPy-extension 
Operators**](#NumPy-and-NumPy-extension-Operators1) 
+
+
+
+### Layers and Blocks
+With deferred compute and tracing being introduced in Gluon2.0, users do not 
need to interact with symbols any more. There are a lot of changes in building 
a model with Gluon API, including parameter management and naming, forward pass 
computing and parameter shape inferencing. We will provide a step-by-step 
migration guidance on how to build a model with new APIs.

Review comment:
       ```suggestion
   With the deferred compute and tracing being introduced in Gluon2.0, users do 
not need to interact with symbols any more. There are a lot of changes in 
building a model with Gluon API, including parameter management and naming, 
forward pass computing and parameter shape inferencing. We provide step-by-step 
migration guidance on how to build a model with new APIs.
   ```

##########
File path: 
docs/python_docs/python/tutorials/getting-started/gluon_migration_guide.md
##########
@@ -0,0 +1,453 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+
+# Gluon2.0: Migration Guide
+
+## Overview
+Since the introduction of the Gluon API in MXNet 1.x, it has superceded 
commonly used symbolic, module and model APIs for model development. In fact, 
Gluon was the first in deep learning community to unify the flexibility of 
imperative programming with the performance benefits of symbolic programming, 
through just-in-time compilation. 
+
+In Gluon2.0, we extend the support to MXNet numpy and numpy extension with 
simplified interface and new functionalities: 
+
+- **Simplified hybridization with deferred compute and tracing**: Deferred 
compute allows the imperative execution to be used for graph construction, 
which allows us to unify the historic divergence of NDArray and Symbol. 
Hybridization now works in a simplified hybrid forward interface. Users only 
need to specify the computation through imperative programming. Hybridization 
also works through tracing, i.e. tracing the data flow of the first input data 
to create graph.
+
+- **Data 2.0**: The new design for data loading in Gluon allows hybridizing 
and deploying data processing pipeline in the same way as model hybridization. 
The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
+
+- **Distributed 2.0**: The new distributed-training design in Gluon 2.0 
provides a unified distributed data parallel interface across native Parameter 
Server, BytePS, and Horovod, and is extensible for supporting custom 
distributed training libraries.
+
+- **Gluon Probability**: parameterizable probability distributions and 
sampling functions to facilitate more areas of research such as Baysian methods 
and AutoML.
+
+- **Gluon Metrics** and **Optimizers**: refactored with MXNet numpy interface 
and addressed legacy issues.
+
+Adopting these new functionalities may or may not require modifications on 
your models. But don't worry, this migration guide will go through a high-level 
mapping from old functionality to new APIs and make Gluon2.0 migration a 
hassle-free experience.  
+
+## Data Pipeline
+**What's new**: In Gluon2.0, `MultithreadingDataLoader` is introduced to speed 
up the data loading pipeline. It will use the pure MXNet C++ implementation of 
dataloader, datasets and batchify functions. So, you can use either MXNet 
internal multithreading mode dataloader or python multiprocessing mode 
dataloader in Gluon2.0. 
+
+**Migration Guide**: Users can continue with the traditional 
gluon.data.Dataloader and the C++ backend will be applied automatically. 
+
+[Gluon2.0 
dataloader](../../api/gluon/data/index.rst#mxnet.gluon.data.DataLoader) will 
provide a new parameter called `try_nopython`. This parameter takes default 
value of None; when set to `True` the dataloader will compile python 
dataloading pipeline into pure MXNet c++ implementation. The compilation is not 
guaranteed to support all use cases, but it will fallback to python in case of 
failure: 
+
+- The dataset is not fully [supported by 
backend](../../api/gluon/data/index.rst#mxnet.gluon.data.Dataset)(e.g., there 
are custom python datasets).
+
+- Transform is not fully hybridizable. 
+
+- Bachify is not fully [supported by 
backend](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/data/batchify.py).
 
+
+
+You can refer to [Step5 in Crash 
Course](https://mxnet.apache.org/versions/master/api/python/docs/tutorials/getting-started/crash-course/5-datasets.html#New-in-MXNet-2.0:-faster-C++-backend-dataloaders)
 for a detailed performance increase with C++ backend. 
+## Modeling
+In Gluon2.0, users will have a brand new modeling experience with 
NumPy-compatible APIs and deferred compute mechanism. 
+
+- **NumPy-compatible programing experience**: users can build their models 
with MXNet implementation with NumPy array library, NumPy-compatible math 
operators and some neural network extension operators. 
+
+- **Imperative-only coding experience**: with deferred compute and tracing 
being introduced, users only need to specify the computation through imperative 
coding but can still make hybridization work. Users will no longer need to 
interact with symbol APIs. 
+
+To help users migrate smoothly to use these simplified interface, we will 
provide the following guidance on how to replace legacy operators with 
NumPy-compatible operators, how to build models with `forward` instead of 
`hybrid_forward` and how to use `Parameter` class to register your parameters. 
+
+
+### NumPy-compatible Programming Experience
+#### NumPy Arrays
+MXNet [NumPy ndarray(i.e. `mx.np.ndarray`)](../../api/np/arrays.ndarray.html) 
is a multidimensional container of items of the same type and size. Most of its 
properties and attributes are the same as legacy NDArrays(i.e. 
`mx.nd.ndarray`), so users can use NumPy array library just as they did with 
legacy NDArrays. But, there are still some changes and deprecations that needs 
attention, as mentioned below. 
+**Migration Guide**: 
+
+1. Currently, NumPy ndarray only supports `default` storage type, other 
storage types, like `row_sparse`, `csr` are not supported. Also, `tostype()` 
attribute is deprecated. 
+
+2. Users can use `as_np_ndarray` attribute to switch from a legacy NDArray to 
NumPy ndarray just like this:
+    ```{.python}
+    import mxnet as mx
+    nd_array = mx.ones((5,3))
+    np_array = nd_array.as_np_ndarray()
+    ```
+
+3. Compared with legacy NDArray, some attributes are deprecated in NumPy 
ndarray. Listed below are some of the deprecated APIs and their corresponding 
replacements in NumPy ndarray, others can be found in [**Appendix/NumPy Array 
Deprecated Attributes**](#NumPy-Array-Deprecated-Attributes).
+    |                   Deprecated Attributes               |    NumPy ndarray 
Equivalent    |
+    | ----------------------------------------------------- | 
------------------------------ |
+    |                   `a.asscalar()`                      |         
`a.item()`         |
+    |                 `a.as_in_context()`                   |      
`a.as_in_ctx()`       |
+    |                    `a.context`                        |          `a.ctx` 
          |
+    |                   `a.reshape_like(b)`                 |    
`a.reshape(b.shape)`    |
+    |                    `a.zeros_like(b)`                  |   
`mx.np.zeros_like(b)`  |
+    |                    `a.ones_like(b)`                   |   
`mx.np.ones_like(b)`   |
+
+4. Compared with legacy NDArray, some attributes will have different behaviors 
and take different inputs. 
+    |          Attribute            | Legacy Inputs | NumPy Inputs |
+    | ----------------------------- | ------------------------ | -------- |
+    | `a.reshape(*args, **kwargs)`  | **shape**: Some dimensions of the shape 
can take special values from the set {0, -1, -2, -3, -4}. <br> The significance 
of each is explained below: <br>  ``0``  copy this dimension from the input to 
the output shape. <br>  ``-1`` infers the dimension of the output shape by 
using the remainder of the input dimensions. <br> ``-2`` copy all/remainder of 
the input dimensions to the output shape. <br> ``-3`` use the product of two 
consecutive dimensions of the input shape as the output dimension. <br> ``-4`` 
split one dimension of the input into two dimensions passed subsequent to -4 in 
shape (can contain -1). <br> **reverse**: If set to 1, then the special values 
are inferred from right to left | **shape**: shape parameter will be 
**positional argument** rather than key-word argument. <br> Some dimensions of 
the shape can take special values from the set {-1, -2, -3, -4, -5, -6}. <br> 
The significance of each is explained below: <br>  ``-1`` infers 
 the dimension of the output shape by using the remainder of the input 
dimensions. <br> ``-2`` copy this dimension from the input to the output shape. 
<br> ``-3`` will skip current dimension if and only if the current dim size is 
one. <br> ``-4`` copy all remain of the input dimensions to the output shape. 
<br> ``-5`` use the product of two consecutive dimensions of the input shape as 
the output. <br> ``-6`` split one dimension of the input into two dimensions 
passed subsequent to -6 in the new shape. <br> **reverse**: No **reverse** 
parameter for `np.reshape` but for `npx.reshape`. <br> **order**: Read the 
elements of `a` using this index order, and place the elements into the 
reshaped array using this index order. |
+
+
+#### NumPy and NumPy-extension Operators
+Most of the legacy NDArray operators(`mx.nd.op`) have the equivalent ones in 
np/npx namespace, users can just repalce them with `mx.np.op` or `mx.npx.op` to 
migrate. Some of the operators will have different inputs and behaviors as 
listed in the table below. 
+**Migration Guide**:
+
+1. Operators migration with name/inputs changes
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.flatten(*args, **kwargs)`                |            
`mx.npx.batch_flatten(*args, **kwargs)`                    |                
moved to `npx` namespace with new name `batch_flatten`            |
+    |       `mx.nd.concat(a, b, c)`                |            
`mx.np.concatenate([a, b, c])`                    |              - moved to 
`np` namespace with new name `concatenate`. <br> - use list of ndarrays as 
input rather than positional ndarrays           |
+    |        `mx.nd.stack(a, b, c)`                 |            
`mx.np.stack([a, b, c])`                    |              - moved to `np` 
namespace. <br> - use list of ndarrays as input rather than positional ndarrays 
          |
+    |      `mx.nd.SliceChannel(*args, **kwargs)`              |            
`mx.npx.slice_channel(*args, **kwargs)`                 |              - moved 
to `npx` namespace with new name `slice_channel`.          |
+    |      `mx.nd.FullyConnected(*args, **kwargs)`              |            
`mx.npx.fully_connected(*args, **kwargs)`                 |              - 
moved to `npx` namespace with new name `fully_connected`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.Activation(*args, **kwargs)`              |            
`mx.npx.activation(*args, **kwargs)`                 |              - moved to 
`npx` namespace with new name `activation`.          |
+    |      `mx.nd.elemwise_add(a, b)`              |            `a + b`        
         |              - Just use ndarray python operator.          |
+    |      `mx.nd.elemwise_mul(a, b)`              |            
`mx.np.multiply(a, b)`                 |              - Use `multiply` operator 
in `np` namespace.          |
+
+2. Operators migration with multiple steps: `mx.nd.mean` -> `mx.np.mean`:
+```{.python}
+import mxnet as mx
+# Legacy: calculate mean value with reduction on axis 1
+#         with `exclude` option on 
+nd_mean = mx.nd.mean(data, axis=1, exclude=1)
+
+# Numpy: no exclude option to users, but user can perform steps as follow
+axes = list(range(data.ndim))
+del axes[1]
+np_mean = mx.np.mean(data, axis=axes)
+```
+
+3. Random Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ---------------------------- |
+    |       `mx.random.uniform(-1.0, 1.0, shape=(2, 3))` <br> 
`mx.nd.random.uniform(-1.0, 1.0, shape=(2, 3))`                |            
`mx.np.random.uniform(-1.0, 1.0, size=(2, 3))`                    |             
   For all the NumPy random operators, use **size** key word instead of 
**shape**           |
+    |       `mx.nd.random.multinomial(*args, **kwargs)`              |         
   `mx.npx.random.categorical(*args, **kwargs)`                    |            
    [use `npx.random.categorical` to have the behavior of drawing 1 sample from 
multiple 
distributions.](https://github.com/apache/incubator-mxnet/issues/20373#issuecomment-869120214)
           |
+
+4. Control Flow Operators
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.contrib.foreach(body, data, init_states, name)`             
   |            `mx.npx.foreach(body, data, init_states, name)`                 
   |                - moved to `npx` namespace. <br> - Will not support global 
variables as body's inputs(body's inputs must be either data or states or both) 
          |
+    |       `mx.nd.contrib.while_loop(cond, func, loop_vars, max_iterations, 
name)`                |            `mx.npx.while_loop(cond, func, loop_vars, 
max_iterations, name)`                    |                - moved to `npx` 
namespace. <br> - Will not support global variables as cond or func's 
inputs(cond or func's inputs must be in loop_vars)           |
+    |       `mx.nd.contrib.cond(pred, then_func, else_func, inputs, name)`     
           |            `mx.npx.cond(pred, then_func, else_func, name)`         
           |                - moved to `npx` namespace. <br> - users needs to 
provide the inputs of pred, then_func and else_func as inputs <br> - Will not 
support global variables as pred, then_func or else_func's inputs(pred, 
then_func or else_func's inputs must be in inputs)           |
+
+5. Functionalities
+    |                   Legacy Operators               |    NumPy Operators 
Equivalent    |   Changes  |
+    | ----------------------------------------------------- | 
------------------------------ | ------------------- |
+    |       `mx.nd.save(*args, **kwargs)`                |            
`mx.npx.savez(*args, **kwargs)`                    |                - moved to 
`npx` namespace. <br> - Only accept positional arguments, try to flatten the 
list/dict before feed in          |
+    |       `mx.nd.load(*args, **kwargs)`                |            
`mx.npx.load(*args, **kwargs)`                    |                - moved to 
`npx` namespace.         |
+    |       `mx.nd.waitall()`                |            `mx.npx.waitall()`   
                 |                - moved to `npx` namespace.         |
+
+Other operator changes are included in [**Appendix/NumPy and NumPy-extension 
Operators**](#NumPy-and-NumPy-extension-Operators1) 
+
+
+
+### Layers and Blocks
+With deferred compute and tracing being introduced in Gluon2.0, users do not 
need to interact with symbols any more. There are a lot of changes in building 
a model with Gluon API, including parameter management and naming, forward pass 
computing and parameter shape inferencing. We will provide a step-by-step 
migration guidance on how to build a model with new APIs.
+
+#### Parameter Management and Block Naming
+In Gluon, each Parameter or Block has a name (and prefix). Parameter names are 
specified by users and Block names can be either specified by users or 
automatically created. In Gluon 1.x, parameters are accessed via the `params` 
variable of the `ParameterDict` in `Block`. Users will need to manually use 
`with self.name_scope():` for children blocks and specify prefix for the top 
level block. Otherwise, it will lead to wrong name scopes and can return 
parameters of children blocks that are not in current name scope. An example 
for initializing the Block and Parameter in Gluon 1.x: 
+```{.python}
+from mxnet.gluon import Parameter, Constant, HybridBlock
+class SampleBlock(HybridBlock):
+    def __init__(self):
+        super(SampleBlock, self).__init__()
+        with self.name_scope():
+            # Access parameters, which are iterated during training
+            self.weight = self.params.get('weight')
+            # Access constant parameters, which are not iterated during 
training
+            self.weight = self.params.get_constant('const', const_arr)
+```
+Now in Gluon 2.0, Block/HybridBlock objects will not maintain the parameter 
dictionary(`ParameterDict`). Instead, users can access these parameters via 
`Parameter` class and `Constant` class. These parameters will be registered 
automatically as part of the Block. Users will no longer need to manage the 
name scope for children blocks and hence can remove `with self.name_scope():` 
this statement. For example: 
+```{.python}
+class SampleBlock(HybridBlock):
+    def __init__(self):
+        super(SampleBlock, self).__init__()
+        # Access parameters, which are iterated during training
+        self.weight = Parameter('weight')
+        # Access constant parameters, which are not iterated during training
+        self.weight = Constant('const', const_arr)
+```
+Also, there will be new mechanism for parameter loading, sharing and setting 
context. 

Review comment:
       ```suggestion
   Also, there will be new mechanisms for parameter loading, sharing and 
setting context. 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to