[GitHub] [incubator-mxnet] szha commented on a change in pull request #20262: [2.0] Gluon2.0: switch to use forward interface

GitBox Sun, 06 Jun 2021 09:43:38 -0700


szha commented on a change in pull request #20262:
URL: https://github.com/apache/incubator-mxnet/pull/20262#discussion_r646148463




##########
File path: python/mxnet/gluon/data/vision/transforms/__init__.py
##########
@@ -129,10 +131,8 @@ def __init__(self, dtype='float32'):
         super(Cast, self).__init__()
         self._dtype = dtype
 
-    def hybrid_forward(self, F, *args):
-        if is_np_array():
-            F = F.npx
-        return tuple([F.cast(x, self._dtype) for x in args])
+    def forward(self, *args):
+        return tuple([x.astype(self._dtype) for x in args])

Review comment:
       nit: the list doesn't seem necessary

##########
File path: 
docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md
##########
@@ -131,50 +128,47 @@ Output:
  [-0.05046433]
  [-1.2375476 ]
  [-0.15506986]]
-<NDArray 5x1 @cpu(0)>
 ```
 
 
 ## Parameters of a custom layer
 
 Usually, a layer has a set of associated parameters, sometimes also referred 
as weights. This is an internal state of a layer. Most often, these parameters 
are the ones, that we want to learn during backpropogation step, but sometimes 
these parameters might be just constants we want to use during forward pass.
 
-All parameters of a block are stored and accessed via 
[ParameterDict](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/parameter.py#L508)
 class. This class helps with initialization, updating, saving and loading of 
the parameters. Each layer can have multiple set of parameters, and all of them 
can be stored in a single instance of the `ParameterDict` class. On a block 
level, the instance of the `ParameterDict` class is accessible via 
`self.params` field, and outside of a block one can access all parameters of 
the network via 
[collect_params()](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Block.collect_params)
 method called on a `container`. `ParameterDict` uses 
[Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter)
 class to represent parameters inside of Apache MxNet neural network. If 
parameter doesn't exist, trying to get a parameter via `self.params` will 
create it automatically.
+All parameters of a block are stored and accessed via 
[ParameterDict](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/parameter.py#L508)
 class. This class helps with initialization, updating, saving and loading of 
the parameters. Each layer can have multiple set of parameters, and all of them 
can be stored in a single instance of the `ParameterDict` class. On a block 
level, the instance of the `ParameterDict` class is accessible via 
`self.params` field, and outside of a block one can access all parameters of 
the network via 
[collect_params()](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Block.collect_params)
 method called on a `container`. `ParameterDict` uses 
[Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter)
 class to represent parameters inside of Apache MxNet neural network.

Review comment:
       We don't have a custom class for the dictionary of parameters. 
`collect_params()` now returns a regular dictionary. @leezu anything you want 
to highlight here?

##########
File path: python/mxnet/gluon/rnn/rnn_layer.py
##########
@@ -182,65 +180,79 @@ def __call__(self, inputs, states=None, 
sequence_length=None, **kwargs):
         else:
             return super(_RNNLayer, self).__call__(inputs, states, **kwargs)
 
-    def hybrid_forward(self, F, inputs, states, sequence_length=None, 
**kwargs):
-        if F is ndarray:
-            batch_size = inputs.shape[self._layout.find('N')]
+    def forward(self, inputs, states, sequence_length=None):
+        batch_size = inputs.shape[self._layout.find('N')]
 
-        if F is ndarray:
-            for state, info in zip(states, self.state_info(batch_size)):
-                if state.shape != info['shape']:
-                    raise ValueError(
-                        "Invalid recurrent state shape. Expecting %s, got 
%s."%(
-                            str(info['shape']), str(state.shape)))
-        out = self._forward_kernel(F, inputs, states, sequence_length, 
**kwargs)
+        for state, info in zip(states, self.state_info(batch_size)):
+            if state.shape != info['shape']:
+                raise ValueError(
+                    "Invalid recurrent state shape. Expecting %s, got %s."%(
+                        str(info['shape']), str(state.shape)))
+        out = self._forward_kernel(inputs, states, sequence_length)
 
         # out is (output, state)
         return out[0] if self.skip_states else out
 
-    def _forward_kernel(self, F, inputs, states, sequence_length, **kwargs):
+    def infer_shape(self, inputs, *args):
+        assert inputs.ndim == 3, \
+            "Input data should be rank-3 tensor of dim [sequence length, batch 
size, input size]"
+        if not self._projection_size:
+            step = self._hidden_size
+        else:
+            step = self._projection_size
+        ni = inputs.shape[2]
+        for i in range(self._num_layers):
+            for j in ['l', 'r'][:self._dir]:
+                name = '{}{}_i2h_weight'.format(j, i)
+                getattr(self, name).shape = (self._gates*self._hidden_size, ni)
+            ni = step * self._dir
+
+    def _forward_kernel(self, inputs, states, sequence_length):
         """ forward using CUDNN or CPU kenrel"""
-        swapaxes = F.np.swapaxes if is_np_array() else F.swapaxes
+        ctx = inputs.ctx
         if self._layout == 'NTC':
-            inputs = swapaxes(inputs, 0, 1)
+            inputs = np.swapaxes(inputs, 0, 1)
         if self._projection_size is None:
-            params = (kwargs['{}{}_{}_{}'.format(d, l, g, t)].reshape(-1)
+            params = (getattr(self, '{}{}_{}_{}'.format(d, l, g, 
t)).data(ctx).reshape(-1)
                       for t in ['weight', 'bias']
                       for l in range(self._num_layers)
                       for d in ['l', 'r'][:self._dir]
                       for g in ['i2h', 'h2h'])
         else:
-            params = (kwargs['{}{}_{}_{}'.format(d, l, g, t)].reshape(-1)
+            params = (getattr(self, '{}{}_{}_{}'.format(d, l, g, 
t)).data(ctx).reshape(-1)
                       for t in ['weight', 'bias']
                       for l in range(self._num_layers)
                       for d in ['l', 'r'][:self._dir]
                       for g in ['i2h', 'h2h', 'h2r']
                       if g != 'h2r' or t != 'bias')
 
-        rnn_param_concat = F.np._internal.rnn_param_concat if is_np_array()\
-            else F._internal._rnn_param_concat
-        params = rnn_param_concat(*params, dim=0)
+        params = ndarray.np._internal.rnn_param_concat(*params, dim=0)

Review comment:
       I think we should get rid of `rnn_param_concat` by
   1. registering only a fused parameter for the RNN layer
   2. add a utility for converting the fused parameter to split parameters for 
consumption in RNN cells only as needed.

##########
File path: tests/python/unittest/test_contrib_control_flow.py
##########
@@ -26,7 +26,9 @@
 from mxnet.base import _as_list
 from mxnet.attribute import AttrScope
 
+mx.npx.reset_np()

Review comment:
       does this work for multiprocess testing?

##########
File path: tests/python/unittest/test_gluon.py
##########
@@ -353,145 +240,16 @@ def test_sparse_hybrid_block():
     params['bias'] = gluon.Parameter('bias', shape=(5), dtype='float32')
     net = gluon.nn.Dense(5).share_parameters(params)
     net.initialize()
-    x = mx.nd.ones((2,5))
+    x = mx.np.ones((2,5))
     with pytest.raises(RuntimeError):
         # an exception is expected when forwarding a HybridBlock w/ sparse 
param
         y = net(x)
 
-def test_hybrid_block_none_args():

Review comment:
       it tests for the usage of None in block forward arguments which is a 
needed use case, so the test should be adapted instead of removed.

##########
File path: tests/python/unittest/test_contrib_control_flow.py
##########
@@ -1053,66 +1055,6 @@ def cond(inputs, free):
                     ]
                 )
 
-class RNNLayer(gluon.HybridBlock):
-    def __init__(self, cell_type, hidden_size):
-        super(RNNLayer, self).__init__()
-        self.cell = cell_type(hidden_size)
-
-    def hybrid_forward(self, F, inputs, states):
-        out, states = F.contrib.foreach(self.cell, inputs, states)
-        return out
-
-def check_contrib_rnn(cell_type, num_states):
-    batch_size = 10
-    hidden_size = 100
-    rnn_data = mx.nd.normal(loc=0, scale=1, shape=(5, batch_size, 50))
-    state_shape = (batch_size, hidden_size)
-    states = [mx.nd.normal(loc=0, scale=1, shape=state_shape) for i in 
range(num_states)]
-    layer = RNNLayer(cell_type, hidden_size)
-    layer.initialize(ctx=default_context())
-    res1 = layer(rnn_data, states)
-    params1 = layer.collect_params()
-    orig_params1 = copy.deepcopy(params1)
-
-    trainer = gluon.Trainer(params1, 'sgd', {'learning_rate' : 0.03})
-    with mx.autograd.record():
-        res1 = layer(rnn_data, states)
-    res1.backward()
-    trainer.step(batch_size)
-
-    configs = [
-            {},
-            {'inline_limit': 0},
-            {'static_alloc': True},
-            {'static_alloc': True, 'static_shape': True} ]
-    for config in configs:
-        layer = RNNLayer(cell_type, hidden_size)
-        layer.initialize(ctx=default_context())
-        layer.hybridize(**config)
-        res2 = layer(rnn_data, states)
-        params2 = layer.collect_params()
-        for key, val in orig_params1.items():
-            params2[key].set_data(copy.deepcopy(val.data()))
-        trainer = gluon.Trainer(params2, 'sgd', {'learning_rate' : 0.03})
-        with mx.autograd.record():
-            res2 = layer(rnn_data, states)
-        assert_almost_equal(res1.asnumpy(), res2.asnumpy(), rtol=1e-3, 
atol=1e-3)
-        res2.backward()
-        trainer.step(batch_size)
-
-        for key, val in params1.items():
-            weight1 = val.data()
-            weight2 = params2[key].data()
-            assert_almost_equal(weight1.asnumpy(), weight2.asnumpy(),
-                    rtol=1e-3, atol=1e-3)
-
-
-def test_contrib_rnn():
-    cell_types = [(gluon.rnn.RNNCell, 1), (gluon.rnn.LSTMCell, 2),
-            (gluon.rnn.GRUCell, 1)]
-    for cell_type, num_states in cell_types:
-        check_contrib_rnn(cell_type, num_states)

Review comment:
       why is this test removed?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -32,49 +31,19 @@ def check_rnn_states(fused_states, stack_states, 
num_layers, bidirectional=False
     assert len(stack_states) / len(fused_states) == num_layers * directions
 
     fused_states = [state.asnumpy() for state in fused_states]
-    stack_states = [np.expand_dims(state.asnumpy(), axis=0) for state in 
stack_states]
+    stack_states = [_np.expand_dims(state.asnumpy(), axis=0) for state in 
stack_states]
     if is_lstm:
         stack_states_h = stack_states[0::2]
         stack_states_c = stack_states[1::2]
-        stack_states = [np.concatenate(stack_states_h, axis=0), 
np.concatenate(stack_states_c, axis=0)]
+        stack_states = [_np.concatenate(stack_states_h, axis=0), 
_np.concatenate(stack_states_c, axis=0)]
     else:
-        stack_states = [np.concatenate(stack_states, axis=0)]
+        stack_states = [_np.concatenate(stack_states, axis=0)]
 
     for f, s in zip(fused_states, stack_states):
         assert f.shape == s.shape
         assert_almost_equal(f, s, atol=1e-4, rtol=1e-4)
 
 
-def test_rnn():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -585,30 +374,30 @@ def check_rnn_layer_forward(layer, inputs, states=None, 
run_only=False, ctx=mx.c
         mx.test_utils.assert_almost_equal(np_dx, inputs.grad.asnumpy(), 
rtol=1e-3, atol=1e-5)
 
 
-
[email protected]_np
 def run_rnn_layers(dtype, dtype2, ctx=mx.cpu()):

Review comment:
       `@pytest.mark.parametrize`

##########
File path: python/mxnet/gluon/data/vision/transforms/__init__.py
##########
@@ -129,10 +131,8 @@ def __init__(self, dtype='float32'):
         super(Cast, self).__init__()
         self._dtype = dtype
 
-    def hybrid_forward(self, F, *args):
-        if is_np_array():
-            F = F.npx
-        return tuple([F.cast(x, self._dtype) for x in args])
+    def forward(self, *args):
+        return tuple([x.astype(self._dtype) for x in args])

Review comment:
       ```suggestion
           return tuple(x.astype(self._dtype) for x in args)
   ```

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():
-    # should behave the same as lstm cell
-    EXPECTED_LSTM_OUTPUT = np.array([[[0.72045636, 0.72045636, 0.95215213, 
0.95215213],
-                                      [0.72045636, 0.72045636, 0.95215213, 
0.95215213]],
-                                     [[0.95215213, 0.95215213, 0.72045636, 
0.72045636],
-                                      [0.95215213, 0.95215213, 0.72045636, 
0.72045636]]])
-    x = mx.nd.ones(shape=(2, 2, 2))
-    model = mx.gluon.rnn.LSTM(2, num_layers=6, bidirectional=True)
-    model.initialize(mx.init.One())
-
-    y = model(x).asnumpy()
-    mx.test_utils.assert_almost_equal(y, EXPECTED_LSTM_OUTPUT,
-                                      rtol=1e-3, atol=1e-5)
-
-
-def test_gru():
-    cell = gluon.rnn.GRUCell(100, activation='relu', 
recurrent_activation='tanh')
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight', 
'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [
-        'grucell_t0_out_output', 'grucell_t1_out_output',
-        'grucell_t2_out_output'
-    ]
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_residual():
-    cell = gluon.rnn.ResidualCell(gluon.rnn.GRUCell(50))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.h2h_bias', 'base_cell.h2h_weight', 
'base_cell.i2h_bias', 'base_cell.i2h_weight']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10, 50), t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'t0_data': mx.nd.ones((10, 50)),
-                              't1_data': mx.nd.ones((10, 50)),
-                              cell.base_cell.i2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.i2h_bias.var().name: 
mx.nd.zeros((150, )),
-                              cell.base_cell.h2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.h2h_bias.var().name: 
mx.nd.zeros((150, ))})
-    expected_outputs = np.ones((10, 50))
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
[email protected]
-def test_residual_bidirectional():
-    cell = gluon.rnn.ResidualCell(
-            gluon.rnn.BidirectionalCell(
-                gluon.rnn.GRUCell(25),
-                gluon.rnn.GRUCell(25)))
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs, merge_outputs=False)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.l_cell.h2h_bias', 'base_cell.l_cell.h2h_weight',
-            'base_cell.l_cell.i2h_bias', 'base_cell.l_cell.i2h_weight',
-            'base_cell.r_cell.h2h_bias', 'base_cell.r_cell.h2h_weight',
-            'base_cell.r_cell.i2h_bias', 'base_cell.r_cell.i2h_weight']
-    # assert outputs.list_outputs() == \
-    #        ['bi_t0_plus_residual_output', 'bi_t1_plus_residual_output']
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=(10, 50), 
rnn_t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'rnn_t0_data':mx.nd.ones((10, 50))+5,
-                              'rnn_t1_data':mx.nd.ones((10, 50))+5,
-                              
cell.base_cell.l_cell.i2h_weight.var().name:mx.nd.zeros((75, 50)),
-                              
cell.base_cell.l_cell.i2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.l_cell.h2h_weight.var().name:mx.nd.zeros((75, 25)),
-                              
cell.base_cell.l_cell.h2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.r_cell.i2h_weight.var().name:mx.nd.zeros((75, 50)),
-                              
cell.base_cell.r_cell.i2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.r_cell.h2h_weight.var().name:mx.nd.zeros((75, 25)),
-                              
cell.base_cell.r_cell.h2h_bias.var().name:mx.nd.zeros((75,))})
-    expected_outputs = np.ones((10, 50))+5
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
-def test_stack():
-    cell = gluon.rnn.SequentialRNNCell()
-    for i in range(5):
-        if i == 1:
-            cell.add(gluon.rnn.ResidualCell(gluon.rnn.LSTMCell(100)))
-        else:
-            cell.add(gluon.rnn.LSTMCell(100))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    keys = sorted(cell.collect_params().keys())
-    for i in range(5):
-        if i==1:
-            continue
-        assert '%d.h2h_weight'%i in keys
-        assert '%d.h2h_bias'%i in keys
-        assert '%d.i2h_weight'%i in keys
-        assert '%d.i2h_bias'%i in keys
-    assert '1.base_cell.h2h_weight' in keys
-    assert '1.base_cell.h2h_bias' in keys
-    assert '1.base_cell.i2h_weight' in keys
-    assert '1.base_cell.i2h_bias' in keys
-    assert outputs.list_outputs() == ['lstmcell_t0_out_output', 
'lstmcell_t1_out_output', 'lstmcell_t2_out_output']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_hybridstack():
-    cell = gluon.rnn.HybridSequentialRNNCell()
-    for i in range(5):
-        if i == 1:
-            cell.add(gluon.rnn.ResidualCell(gluon.rnn.LSTMCell(100)))
-        else:
-            cell.add(gluon.rnn.LSTMCell(100))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    keys = sorted(cell.collect_params().keys())
-    for i in range(5):
-        if i==1:
-            continue
-        assert '%d.h2h_weight'%i in keys
-        assert '%d.h2h_bias'%i in keys
-        assert '%d.i2h_weight'%i in keys
-        assert '%d.i2h_bias'%i in keys
-    assert '1.base_cell.h2h_weight' in keys
-    assert '1.base_cell.h2h_bias' in keys
-    assert '1.base_cell.i2h_weight' in keys
-    assert '1.base_cell.i2h_bias' in keys
-    assert outputs.list_outputs() == ['lstmcell_t0_out_output', 
'lstmcell_t1_out_output', 'lstmcell_t2_out_output']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-    # Test HybridSequentialRNNCell nested in nn.HybridBlock, SequentialRNNCell 
will fail in this case
-    class BidirectionalOfSequential(gluon.HybridBlock):
-        def __init__(self):
-            super(BidirectionalOfSequential, self).__init__()
-
-            cell0 = gluon.rnn.HybridSequentialRNNCell()
-            cell0.add(gluon.rnn.LSTMCell(100))
-            cell0.add(gluon.rnn.LSTMCell(100))
-
-            cell1 = gluon.rnn.HybridSequentialRNNCell()
-            cell1.add(gluon.rnn.LSTMCell(100))
-            cell1.add(gluon.rnn.LSTMCell(100))
-
-            self.rnncell = gluon.rnn.BidirectionalCell(cell0, cell1)
-
-        def hybrid_forward(self, F, x):
-            return self.rnncell.unroll(3, x, layout="NTC", merge_outputs=True)
-
-    x = mx.nd.random.uniform(shape=(10, 3, 100))
-    net = BidirectionalOfSequential()
-    net.initialize()
-    outs, _ = net(x)
-
-    assert outs.shape == (10, 3, 200)
-
-
-def test_bidirectional():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_loss.py
##########
@@ -16,91 +16,98 @@
 # under the License.
 
 import mxnet as mx
-import numpy as np
+import numpy as _np

Review comment:
       use `onp` as the name for consistency

##########
File path: tests/python/unittest/test_gluon.py
##########
@@ -27,10 +27,9 @@
 from mxnet.util import is_np_array
 from mxnet.ndarray.ndarray import _STORAGE_TYPE_STR_TO_ID
 from mxnet.test_utils import use_np
-import mxnet.numpy as _mx_np
 from common import assertRaises, assert_raises_cudnn_not_satisfied, \
     xfail_when_nonstandard_decimal_separator, environment
-import numpy as np
+import numpy as _np

Review comment:
       use `onp` as the name to be consistent

##########
File path: python/mxnet/gluon/nn/basic_layers.py
##########
@@ -123,31 +122,9 @@ def add(self, *blocks):
             self.register_child(block)
 
     def __call__(self, *args, **kwargs):
-        if self._active  and not self._v2_checked and not 
dc.is_deferred_compute():
-            # If any of the child Blocks implements the Gluon 2 interface, the
-            # container must not pass a Symbol to them
-            if any(inspect.unwrap(chld().hybrid_forward.__func__) is
-                   HybridBlock.hybrid_forward for chld in 
self._children.values()):
-                self._v2 = True
-                self._v2_checked = True
-                self.forward = self._forward
-
         return super().__call__(*args, **kwargs)

Review comment:
       if there's no override, this method is not needed anymore.

##########
File path: src/api/operator/numpy_extension/npx_pooling_op.cc
##########
@@ -97,14 +97,19 @@ MXNET_REGISTER_API("_npx.pooling")
   } else {
     param.kernel = TShape(args[1].operator ObjectRef());
   }
-
+  // global pool
+  param.global_pool = args[6].operator bool();
   // stride
   if (args[2].type_code() == kNull) {
     if (param.kernel.ndim() == 1) {
       param.stride = mshadow::Shape1(1);
     } else if (param.kernel.ndim() == 2) {
       param.stride = mshadow::Shape2(1, 1);
     } else {
+      if (param.global_pool == false) {
+        CHECK_EQ(param.kernel.ndim(), 3U) << param.kernel.ndim()
+            << "D pooling not supported";

Review comment:
       ```suggestion
               << "D pooling not supported. Only 1D, 2D, and 3D pooling are 
supported.";
   ```

##########
File path: docs/python_docs/python/tutorials/packages/gluon/loss/custom-loss.md
##########
@@ -45,7 +45,7 @@ import random
 
 The loss function uses a margin *m* which is has the effect that dissimlar 
pairs only contribute if their loss is within a certain margin.
 
-In order to implement such a customized loss function in Gluon, we only need 
to define a new class that is inheriting from the 
[Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.Loss) base class. 
We then define the contrastive loss logic in the 
[hybrid_forward](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock.hybrid_forward)
 method. This method takes the images `image1`, `image2` and the label which 
defines whether  `image1` and `image2` are similar (=0) or  dissimilar (=1). 
The input F is an `mxnet.ndarry` or an `mxnet.symbol` if we hybridize the 
network. Gluon's `Loss` base class is in fact a 
[HybridBlock](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock). 
This means we can either run  imperatively or symbolically. When we hybridize 
our custom loss function, we can get performance speedups.
+In order to implement such a customized loss function in Gluon, we only need 
to define a new class that is inheriting from the 
[Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.Loss) base class. 
We then define the contrastive loss logic in the 
[forward](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock.forward)
 method. This method takes the images `image1`, `image2` and the label which 
defines whether  `image1` and `image2` are similar (=0) or  dissimilar (=1). 
The input F is an `mxnet.ndarry` or an `mxnet.symbol` if we hybridize the 
network. Gluon's `Loss` base class is in fact a 
[HybridBlock](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock). 
This means we can either run  imperatively or symbolically. When we hybridize 
our custom loss function, we can get performance speedups.

Review comment:
       ```suggestion
   In order to implement such a customized loss function in Gluon, we just need 
to define a new class that is inheriting from the 
[Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.Loss) base class. 
We then define the contrastive loss logic in the 
[forward](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock.forward)
 method. This method takes the images `image1`, `image2` and the label which 
defines whether  `image1` and `image2` are similar (=0) or  dissimilar (=1). 
Gluon's `Loss` base class is in fact a 
[HybridBlock](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock), 
and when we hybridize our custom loss function, we can get performance speedups.
   ```

##########
File path: python/mxnet/ndarray/numpy_extension/_op.py
##########
@@ -397,12 +397,15 @@ def fully_connected(x, weight, bias=None, num_hidden=None,
         The output of this function.
     """
     assert num_hidden is not None, "Please provide number of hidden nodes"
+    if bias is not None:
+        return _api_internal.fully_connected(x, weight, bias, num_hidden,
+                                             False, flatten)
     if no_bias:
-        return _api_internal.fully_connected(x, weight, num_hidden, no_bias, 
flatten)
+        return _api_internal.fully_connected(x, weight, num_hidden, True, 
flatten)
     else:
         assert bias is not None, "Missing bias parameter"
         return _api_internal.fully_connected(x, weight, bias, num_hidden,
-                                             no_bias, flatten)
+                                             False, flatten)

Review comment:
       the first condition didn't take into account the `no_bias` argument for 
the following case:
   > If ``no_bias`` is set to be true, then the ``bias`` term is ignored.

##########
File path: tests/python/unittest/test_gluon.py
##########
@@ -353,145 +240,16 @@ def test_sparse_hybrid_block():
     params['bias'] = gluon.Parameter('bias', shape=(5), dtype='float32')
     net = gluon.nn.Dense(5).share_parameters(params)
     net.initialize()
-    x = mx.nd.ones((2,5))
+    x = mx.np.ones((2,5))
     with pytest.raises(RuntimeError):
         # an exception is expected when forwarding a HybridBlock w/ sparse 
param
         y = net(x)
 
-def test_hybrid_block_none_args():
-    class Foo(gluon.HybridBlock):
-        def hybrid_forward(self, F, a, b):
-            if a is None and b is not None:
-                return b
-            elif b is None and a is not None:
-                return a
-            elif a is not None and b is not None:
-                return a + b
-            else:
-                raise NotImplementedError
-
-    class FooDefault(gluon.HybridBlock):
-        def hybrid_forward(self, F, a, b=None):
-            if a is None and b is not None:
-                return b
-            elif b is None and a is not None:
-                return a
-            elif a is not None and b is not None:
-                return a + b
-            else:
-                raise NotImplementedError
-
-
-    class FooNested(gluon.HybridBlock):
-        def __init__(self):
-            super(FooNested, self).__init__()
-            self.f1 = Foo()
-            self.f2 = Foo()
-            self.f3 = Foo()
-
-        def hybrid_forward(self, F, a, b):
-            data = self.f1(a, b)
-            data = self.f2(a, data)
-            data = self.f3(data, b)
-            return data
-
-    for arg_inputs in [(None, mx.nd.ones((10,))),
-                       (mx.nd.ones((10,)), mx.nd.ones((10,))),
-                       (mx.nd.ones((10,)), None)]:
-        foo1 = FooNested()
-        foo1.hybridize()
-        foo2 = FooNested()
-        for _ in range(2): # Loop for 2 times to trigger forwarding of the 
cached version
-            out1 = foo1(*arg_inputs)
-            out2 = foo2(*arg_inputs)
-            if isinstance(out1, tuple):
-                for lhs, rhs in zip(out1, out2):
-                    assert_almost_equal(lhs.asnumpy(), rhs.asnumpy())
-            else:
-                assert_almost_equal(out1.asnumpy(), out2.asnumpy())
-    for do_hybridize in [True, False]:
-        foo = FooNested()
-        if do_hybridize:
-            foo.hybridize()
-        pytest.raises(ValueError, foo, None, None)
-
-    # Make sure the ValueError is correctly raised
-    foo = FooNested()
-    foo.hybridize()
-    foo(None, mx.nd.ones((10,)))  # Pass for the first time to initialize the 
cached op
-    pytest.raises(ValueError, lambda: foo(mx.nd.ones((10,)), 
mx.nd.ones((10,))))
-    foo = FooNested()
-    pytest.raises(ValueError, lambda: foo(mx.nd.ones((10,)), mx.sym.var('a')))
-    foo = FooNested()
-    pytest.raises(ValueError, lambda: foo(mx.sym.var('a'), mx.nd.ones((10,))))
-
-    # Test the case of the default values
-    foo1 = FooDefault()
-    foo1.hybridize()
-    foo2 = FooDefault()
-    out1 = foo1(mx.nd.ones((10,)))
-    out2 = foo2(mx.nd.ones((10,)))
-    out3 = foo1(mx.nd.ones((10,)), None)
-    out4 = foo2(mx.nd.ones((10,)), None)
-    assert_almost_equal(out1.asnumpy(), out2.asnumpy())
-    assert_almost_equal(out1.asnumpy(), out3.asnumpy())
-    assert_almost_equal(out1.asnumpy(), out4.asnumpy())
-    foo1 = FooDefault()
-    foo1.hybridize()
-    out1 = foo1(mx.nd.ones((10,)), None)
-    out2 = foo1(mx.nd.ones((10,)))
-    assert_almost_equal(out1.asnumpy(), out2.asnumpy())
-    pytest.raises(ValueError, lambda: foo1(mx.nd.ones((10,)), 
mx.nd.ones((10,))))
-
-
-def test_hybrid_block_hybrid_no_hybrid():
-    class FooHybrid(gluon.HybridBlock):
-        def hybrid_forward(self, F, a, b):
-            if isinstance(a, (list, tuple)):
-                a = sum(a)
-            if isinstance(b, (list, tuple)):
-                b = sum(b)
-            return a + b
-
-    class Foo(gluon.Block):
-        def forward(self, a, b):
-            if isinstance(a, (list, tuple)):
-                a = sum(a)
-            if isinstance(b, (list, tuple)):
-                b = sum(b)
-            return a + b
-    # When hybridize is not called, HybridBlock acts the same as Block
-    foo_hybrid = FooHybrid()
-    foo = Foo()
-    for a, b in [(mx.nd.ones((10,)), 1),
-                 (mx.nd.ones((20,)), 2),
-                 ([mx.nd.ones((10,)), mx.nd.ones((10,))],
-                  [mx.nd.ones((10)), mx.nd.ones((10,)), mx.nd.ones((10,))]),
-                 ([mx.nd.ones((10,)), mx.nd.ones((10,))], 3)]:
-        hybrid_block_out = foo_hybrid(a, b)
-        block_out = foo(a, b)
-        assert_almost_equal(hybrid_block_out.asnumpy(), block_out.asnumpy())
-    # When hybridize is called, we need to make sure that the model raises for 
the unsupported cases
-    # 1. Scalar values in the input
-    # 2. No mixing of sym/ndarray
-    # 3. No mixing of cpu ndarray and gpu ndarray  (Tested in 
gpu/test_gluon_gpu.py)
-    # 4. Allow mixing of cpu_pinned and cpu

Review comment:
       use case 1, 3, 4 are still relevant

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():
-    # should behave the same as lstm cell
-    EXPECTED_LSTM_OUTPUT = np.array([[[0.72045636, 0.72045636, 0.95215213, 
0.95215213],
-                                      [0.72045636, 0.72045636, 0.95215213, 
0.95215213]],
-                                     [[0.95215213, 0.95215213, 0.72045636, 
0.72045636],
-                                      [0.95215213, 0.95215213, 0.72045636, 
0.72045636]]])
-    x = mx.nd.ones(shape=(2, 2, 2))
-    model = mx.gluon.rnn.LSTM(2, num_layers=6, bidirectional=True)
-    model.initialize(mx.init.One())
-
-    y = model(x).asnumpy()
-    mx.test_utils.assert_almost_equal(y, EXPECTED_LSTM_OUTPUT,
-                                      rtol=1e-3, atol=1e-5)
-
-
-def test_gru():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():
-    # should behave the same as lstm cell
-    EXPECTED_LSTM_OUTPUT = np.array([[[0.72045636, 0.72045636, 0.95215213, 
0.95215213],
-                                      [0.72045636, 0.72045636, 0.95215213, 
0.95215213]],
-                                     [[0.95215213, 0.95215213, 0.72045636, 
0.72045636],
-                                      [0.95215213, 0.95215213, 0.72045636, 
0.72045636]]])
-    x = mx.nd.ones(shape=(2, 2, 2))
-    model = mx.gluon.rnn.LSTM(2, num_layers=6, bidirectional=True)
-    model.initialize(mx.init.One())
-
-    y = model(x).asnumpy()
-    mx.test_utils.assert_almost_equal(y, EXPECTED_LSTM_OUTPUT,
-                                      rtol=1e-3, atol=1e-5)
-
-
-def test_gru():
-    cell = gluon.rnn.GRUCell(100, activation='relu', 
recurrent_activation='tanh')
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight', 
'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [
-        'grucell_t0_out_output', 'grucell_t1_out_output',
-        'grucell_t2_out_output'
-    ]
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_residual():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_contrib_control_flow.py
##########
@@ -1670,52 +1615,14 @@ def hybrid_forward(self, F, data):
     assert_almost_equal(res1.asnumpy(), res2.asnumpy(), rtol=1e-3, atol=1e-3)
 
 
-def test_scope():

Review comment:
       why is this test removed?

##########
File path: tests/python/unittest/test_gluon.py
##########
@@ -1063,52 +821,30 @@ def test_sequential_warning():
         assert len(w) == 1
 
 
-def test_global_norm_clip():
-    stypes = ['default', 'row_sparse']
-    def check_global_norm_clip(stype, check_isfinite):

Review comment:
       default global norm clip is still relevant

##########
File path: tests/python/unittest/test_gluon.py
##########
@@ -1493,55 +1237,24 @@ def __init__(self, b1, b2):
     # Test default behavior
     c.save_parameters(param_path, deduplicate=False)
 
-    params = mx.nd.load(param_path)
+    params = mx.npx.load(param_path)
     assert len(params) == 2  # Only a single copy of the shared parameter is 
saved
 
     b1 = B()
     b2 = B().share_parameters(b1.collect_params())
     c = C(b1, b2)
     c.load_parameters(param_path)
 
-def test_symbol_block_save_load(tmpdir):
-    tmp = str(tmpdir)
-    tmpfile = os.path.join(tmp, 'resnet34_fp64')

Review comment:
       @leezu how shall we support the inference use case for exported graph? 
Should we keep the SymbolBlock?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():

Review comment:
       why removed?

##########
File path: python/mxnet/numpy_extension/control_flow.py
##########
@@ -0,0 +1,390 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Namespace for registering numpy_extension ops for imperative programming."""
+
+from ..util import set_module
+from ..ndarray import NDArray
+from ..base import _as_list
+from .. import numpy as _mx_np
+
+
+__all__ = ["foreach", "while_loop", "cond"]
+
+
+def _flatten(args, inout_str):
+    if isinstance(args, NDArray):
+        return [args], int(0)
+
+    assert isinstance(args, (list, tuple)), \
+        "%s must be (nested) list of NDArray, " \
+        "but got %s of type %s"%(inout_str, str(args), str(type(args)))
+    flat = []
+    fmts = []
+    for i in args:
+        arg, fmt = _flatten(i, inout_str)
+        flat.extend(arg)
+        fmts.append(fmt)
+    return flat, fmts
+
+
+def _regroup(args, fmt):
+    if isinstance(fmt, int):
+        if fmt == 0:
+            return args[0], args[1:]
+        return args[:fmt], args[fmt:]
+
+    assert isinstance(args, (list, tuple)), \
+        "output must be (nested) list of NDArray, " \
+        "but got %s of type %s"%(str(args), str(type(args)))
+    ret = []
+    for i in fmt:
+        res, args = _regroup(args, i)
+        ret.append(res)
+    return ret, args
+
+
+@set_module('mxnet.numpy_extension')
+def foreach(body, data, init_states):
+    """Run a for loop with user-defined computation over NDArrays on dimension 
0.
+
+    This operator simulates a for loop and body has the computation for an 
iteration
+    of the for loop. It runs the computation in body on each slice from the 
input
+    NDArrays.
+
+    body takes two arguments as input and outputs a tuple of two elements,
+    as illustrated below::
+
+        out, states = body(data1, states)
+
+    data1 can be either an NDArray or a list of NDArrays. If data is an 
NDArray,
+    data1 is an NDArray. Otherwise, data1 is a list of NDArrays and has the 
same
+    size as data. states is a list of NDArrays and have the same size as 
init_states.
+    Similarly, out can be either an NDArray or a list of NDArrays, which are 
concatenated
+    as the first output of foreach; states from the last execution of body
+    are the second output of foreach.
+
+    The computation done by this operator is equivalent to the pseudo code 
below
+    when the input data is NDArray::
+
+        states = init_states
+        outs = []
+        for i in data.shape[0]:
+            s = data[i]
+            out, states = body(s, states)
+            outs.append(out)
+        outs = stack(*outs)
+
+
+    Parameters
+    ----------
+    body : a Python function.
+        Define computation in an iteration.
+    data: an NDArray or a list of NDArrays.
+        The input data.
+    init_states: an NDArray or nested lists of NDArrays.
+        The initial values of the loop states.
+
+    Returns
+    -------
+    outputs: an NDArray or nested lists of NDArrays.
+        The output data concatenated from the output of all iterations.
+    states: an NDArray or nested lists of NDArrays.
+        The loop states in the last iteration.
+
+    Examples
+    --------
+    >>> step = lambda data, states: (data + states[0], [states[0] * 2])
+    >>> data = mx.np.random.uniform(size=(2, 10))
+    >>> states = [mx.np.random.uniform(size=(10))]
+    >>> outs, states = npx.control_flow.foreach(step, data, states)
+    """
+
+    def check_input(inputs, in_type, msg):
+        is_NDArray_or_list = True
+        if isinstance(inputs, list):
+            for i in inputs:
+                if not isinstance(i, in_type):
+                    is_NDArray_or_list = False
+                    break
+        else:
+            is_NDArray_or_list = isinstance(inputs, in_type)
+        assert is_NDArray_or_list, msg
+
+    flatten, _ = _flatten(data, "foreach input")
+    check_input(flatten, NDArray,
+                "data should be an NDArray or a nested list of NDArrays")
+    flatten, _ = _flatten(init_states, "foreach states")
+    check_input(flatten, NDArray,
+                "init_states should be an NDArray or a nested list of 
NDArrays")
+
+    not_data_list = isinstance(data, NDArray)
+    num_iters = data.shape[0] if not_data_list else data[0].shape[0]
+    states = init_states
+    outputs = []
+    for i in range(num_iters):
+        if not_data_list:
+            eles = data[i]
+        else:
+            eles = [d[i] for d in data]
+        outs, states = body(eles, states)
+        outs, out_fmt = _flatten(outs, "foreach output")
+        outputs.append(outs)
+    outputs = zip(*outputs)
+    tmp_outputs = []
+    for out in outputs:
+        tmp_outputs.append(_mx_np.stack(out))
+    outputs = tmp_outputs
+    outputs, _ = _regroup(outputs, out_fmt)
+
+    return (outputs, states)
+
+
+#pylint: disable=W0621
+@set_module('mxnet.numpy_extension')
+def while_loop(cond, func, loop_vars, max_iterations=None):
+    """Run a while loop with user-defined computation and loop condition.
+
+    This operator simulates a while loop which iterately does customized 
computation
+    as long as the condition is satisfied.
+
+    `loop_vars` is a list of NDArrays on which the computation uses.
+
+    `cond` is a user-defined function, used as the loop condition.
+    It consumes `loop_vars`, and produces a scalar MXNet NDArray,
+    indicating the termination of the loop.
+    The loop ends when `cond` returns false (zero).
+    The `cond` is variadic, and its signature should be
+    `cond(*loop_vars) => NDArray`.
+
+    `func` is a user-defined function, used as the loop body.
+    It also consumes `loop_vars`, and produces `step_output` and 
`new_loop_vars` at each step.
+    In each step, `step_output` should contain the same number elements.
+    Through all steps, the i-th element of `step_output` should have the same 
shape and dtype.
+    Also, `new_loop_vars` should contain the same number of elements as 
`loop_vars`,
+    and the corresponding element should have the same shape and dtype.
+    The `func` is variadic, and its signature should be
+    `func(*loop_vars) =>
+    (NDArray or nested List[NDArray] step_output, NDArray or nested 
List[NDArray] new_loop_vars)`.
+
+    `max_iterations` is a scalar that defines the maximum number of iterations 
allowed.
+
+    This function returns two lists.
+    The first list has the length of `|step_output|`,
+    in which the i-th element are all i-th elements of
+    `step_output` from all steps, stacked along axis 0.
+    The second list has the length of `|loop_vars|`,
+    which represents final states of loop variables.
+
+    .. warning::
+
+       For now, the axis 0 of all NDArrays in the first list are 
`max_iterations`,
+       due to lack of dynamic shape inference.
+
+    .. warning::
+
+       When `cond` is never satisfied, we assume `step_output` is empty,
+       because it cannot be inferred. This is different from the symbolic 
version.
+
+    Parameters
+    ----------
+    cond: a Python function.
+        The loop condition.
+    func: a Python function.
+        The loop body.
+    loop_vars: an NDArray or nested lists of NDArrays.
+        The initial values of the loop variables.
+    max_iterations: a python int.
+        Maximum number of iterations.
+
+    Returns
+    ------
+    outputs: an NDArray or nested lists of NDArrays
+        stacked output from each step
+    states: an NDArray or nested lists of NDArrays
+        final state
+
+    Examples
+    --------
+    >>> cond = lambda i, s: i <= 5
+    >>> func = lambda i, s: ([i + s], [i + 1, s + i])
+    >>> loop_vars = (mx.np.array([0], dtype="int64"), mx.np.array([1], 
dtype="int64"))
+    >>> outputs, states = mx.npx.while_loop(cond, func, loop_vars, 
max_iterations=10)
+    >>> outputs
+    [
+    [[ 1]
+    [ 2]
+    [ 4]
+    [ 7]
+    [11]
+    [16]
+    [...]  # undefined value
+    [...]
+    [...]
+    [...]]
+    <NDArray 6x1 @cpu(0)>]
+    >>> states
+    [
+    [6]
+    <NDArray 1 @cpu(0)>,
+    [16]
+    <NDArray 1 @cpu(0)>]
+    """
+    def _to_python_scalar(inputs, type_, name):
+        """Converts "inputs", possibly typed mxnet NDArray, a numpy ndarray, 
other python types,
+        to the given type
+        """
+        if isinstance(inputs, NDArray):
+            inputs = inputs.item()
+        try:
+            inputs = type_(inputs)
+        except:
+            raise ValueError("Cannot convert %s to python %s" % (name, 
type_.__name__))
+        return inputs
+
+    def _func_wrapper(loop_vars):
+        """This wrapper unifies
+             "func: loop_vars -> new_loop_vars"
+         and "func: loop_vars -> (step_output, new_loop_vars)"
+        into "func: loop_vars -> (None or tuple of step_outputs, tuple of 
new_loop_vars)
+        """
+        step_output, new_loop_vars = func(*loop_vars)
+        if step_output is None:
+            step_output = []
+        if new_loop_vars is None:
+            new_loop_vars = []
+        if isinstance(step_output, tuple):
+            step_output = list(step_output)
+        if isinstance(new_loop_vars, tuple):
+            new_loop_vars = list(new_loop_vars)
+        new_loop_vars = _as_list(new_loop_vars)
+        if len(loop_vars) != len(new_loop_vars):
+            raise ValueError("The length of loop_vars should be consistent 
during the loop")
+        return step_output, new_loop_vars
+
+    if max_iterations is None:
+        raise ValueError("max_iterations should be specified")
+    max_iterations = _to_python_scalar(max_iterations, int, "max_iteration")
+    # It should be work as fine if loop_vars are empty I guess,
+    # but it is semantically unnecessary to include this case.
+    if len(loop_vars) == 0:
+        raise ValueError("loop_vars should contain at least one element")
+
+    steps = 0
+    outputs = []
+    # there might not be an iteration.
+    out_fmt = None
+    not_loop_var_list = isinstance(loop_vars, NDArray)
+    loop_vars = _as_list(loop_vars)
+    while steps < max_iterations and \
+            _to_python_scalar(cond(*loop_vars), bool, "Return value of cond"): 
# loop condition
+        step_output, loop_vars = _func_wrapper(loop_vars)
+        step_output, out_fmt = _flatten(step_output, "while output")
+        outputs.append(step_output)
+        steps += 1
+        if len(outputs) != steps or len(step_output) != len(outputs[0]):
+            raise ValueError("Number of elements in step_output should be the 
same in each step")
+    stacked_outputs = []
+    for i_th, items in enumerate(zip(*outputs), 1):
+        # `mx.ndarray.pad` only support 4-D or 5-D inputs for now
+        # so we could not use it.
+        items = [_mx_np.expand_dims(x, 0) for x in items]
+        try:
+            concate_outputs = _mx_np.concatenate(items, axis=0)
+            print(concate_outputs.shape)
+            if steps != max_iterations and items:
+                to_pad = max_iterations - steps
+                concate_outputs = _mx_np.pad(concate_outputs, pad_width=((0, 
to_pad), (0, 0)))
+            stacked_outputs.append(concate_outputs)
+        except ValueError:
+            raise ValueError("\n".join(
+                ["Shapes of %d-th elements in step_outputs are inconsistent, 
which are:" % i_th] +
+                ["  Step %d, shape is %s" % (i, str(x.shape)) for i, x in 
enumerate(items)]
+            ))
+    if out_fmt is not None:
+        stacked_outputs, _ = _regroup(stacked_outputs, out_fmt)
+    if not_loop_var_list:
+        loop_vars = loop_vars[0]
+    return stacked_outputs, loop_vars
+
+
+@set_module('mxnet.numpy_extension')
+def cond(pred, then_func, else_func):
+    """Run an if-then-else using user-defined condition and computation
+
+    This operator simulates a if-like branch which chooses to do one of
+    the two customized computations according to the specified condition.
+
+    `pred` is a scalar MXNet NDArray,
+    indicating which branch of computation should be used.
+
+    `then_func` is a user-defined function, used as computation of the then 
branch.
+    It produces `outputs`, which is a list of NDArrays.
+    The signature of `then_func` should be
+    `then_func() => NDArray or nested List[NDArray]`.
+
+    `else_func` is a user-defined function, used as computation of the else 
branch.
+    It produces `outputs`, which is a list of NDArrays.
+    The signature of `else_func` should be
+    `else_func() => NDArray or nested List[NDArray]`.
+
+    The `outputs` produces by `then_func` and `else_func` should have the same 
number
+    of elements, all of which should be in the same shape, of the same dtype 
and stype.
+
+    This function returns a list of symbols, representing the computation 
result.
+
+    Parameters
+    ----------
+    pred: a MXNet NDArray representing a scalar.
+        The branch condition.
+    then_func: a Python function.
+        The computation to be executed if `pred` is true.
+    else_func: a Python function.
+        The computation to be executed if `pred` is false.
+
+    Returns
+    -------
+    outputs: an NDArray or nested lists of NDArrays, representing the result 
of computation.
+
+    Examples
+    --------
+    >>> a, b = mx.nd.array([1]), mx.nd.array([2])
+    >>> pred = a * b < 5
+    >>> then_func = lambda: (a + 5) * (b + 5)
+    >>> else_func = lambda: (a - 5) * (b - 5)
+    >>> outputs = mx.nd.contrib.cond(pred, then_func, else_func)
+    >>> outputs[0]
+    [42.]
+    <NDArray 1 @cpu(0)>
+    """
+    def _to_python_scalar(inputs, type_, name):
+        """Converts "inputs", possibly typed mxnet NDArray, a numpy ndarray, 
other python types,
+        to the given type
+        """
+        if hasattr(inputs, "asscalar"):
+            inputs = inputs.item()
+        try:
+            inputs = type_(inputs)
+        except:
+            raise ValueError("Cannot convert %s to python %s" % (name, 
type_.__name__))
+        return inputs
+
+    branch = _to_python_scalar(pred, bool, "pred")
+    if branch:
+        return then_func()
+    else:
+        return else_func()

Review comment:
       for the control flow operators, you can't use the ndarray 
implementation. Otherwise, the deferred compute will only trace the branch that 
was evaluated at the time of the graph tracing.

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():
-    # should behave the same as lstm cell
-    EXPECTED_LSTM_OUTPUT = np.array([[[0.72045636, 0.72045636, 0.95215213, 
0.95215213],
-                                      [0.72045636, 0.72045636, 0.95215213, 
0.95215213]],
-                                     [[0.95215213, 0.95215213, 0.72045636, 
0.72045636],
-                                      [0.95215213, 0.95215213, 0.72045636, 
0.72045636]]])
-    x = mx.nd.ones(shape=(2, 2, 2))
-    model = mx.gluon.rnn.LSTM(2, num_layers=6, bidirectional=True)
-    model.initialize(mx.init.One())
-
-    y = model(x).asnumpy()
-    mx.test_utils.assert_almost_equal(y, EXPECTED_LSTM_OUTPUT,
-                                      rtol=1e-3, atol=1e-5)
-
-
-def test_gru():
-    cell = gluon.rnn.GRUCell(100, activation='relu', 
recurrent_activation='tanh')
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight', 
'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [
-        'grucell_t0_out_output', 'grucell_t1_out_output',
-        'grucell_t2_out_output'
-    ]
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_residual():
-    cell = gluon.rnn.ResidualCell(gluon.rnn.GRUCell(50))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.h2h_bias', 'base_cell.h2h_weight', 
'base_cell.i2h_bias', 'base_cell.i2h_weight']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10, 50), t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'t0_data': mx.nd.ones((10, 50)),
-                              't1_data': mx.nd.ones((10, 50)),
-                              cell.base_cell.i2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.i2h_bias.var().name: 
mx.nd.zeros((150, )),
-                              cell.base_cell.h2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.h2h_bias.var().name: 
mx.nd.zeros((150, ))})
-    expected_outputs = np.ones((10, 50))
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
[email protected]
-def test_residual_bidirectional():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():
-    # should behave the same as lstm cell
-    EXPECTED_LSTM_OUTPUT = np.array([[[0.72045636, 0.72045636, 0.95215213, 
0.95215213],
-                                      [0.72045636, 0.72045636, 0.95215213, 
0.95215213]],
-                                     [[0.95215213, 0.95215213, 0.72045636, 
0.72045636],
-                                      [0.95215213, 0.95215213, 0.72045636, 
0.72045636]]])
-    x = mx.nd.ones(shape=(2, 2, 2))
-    model = mx.gluon.rnn.LSTM(2, num_layers=6, bidirectional=True)
-    model.initialize(mx.init.One())
-
-    y = model(x).asnumpy()
-    mx.test_utils.assert_almost_equal(y, EXPECTED_LSTM_OUTPUT,
-                                      rtol=1e-3, atol=1e-5)
-
-
-def test_gru():
-    cell = gluon.rnn.GRUCell(100, activation='relu', 
recurrent_activation='tanh')
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight', 
'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [
-        'grucell_t0_out_output', 'grucell_t1_out_output',
-        'grucell_t2_out_output'
-    ]
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_residual():
-    cell = gluon.rnn.ResidualCell(gluon.rnn.GRUCell(50))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.h2h_bias', 'base_cell.h2h_weight', 
'base_cell.i2h_bias', 'base_cell.i2h_weight']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10, 50), t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'t0_data': mx.nd.ones((10, 50)),
-                              't1_data': mx.nd.ones((10, 50)),
-                              cell.base_cell.i2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.i2h_bias.var().name: 
mx.nd.zeros((150, )),
-                              cell.base_cell.h2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.h2h_bias.var().name: 
mx.nd.zeros((150, ))})
-    expected_outputs = np.ones((10, 50))
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
[email protected]
-def test_residual_bidirectional():
-    cell = gluon.rnn.ResidualCell(
-            gluon.rnn.BidirectionalCell(
-                gluon.rnn.GRUCell(25),
-                gluon.rnn.GRUCell(25)))
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs, merge_outputs=False)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.l_cell.h2h_bias', 'base_cell.l_cell.h2h_weight',
-            'base_cell.l_cell.i2h_bias', 'base_cell.l_cell.i2h_weight',
-            'base_cell.r_cell.h2h_bias', 'base_cell.r_cell.h2h_weight',
-            'base_cell.r_cell.i2h_bias', 'base_cell.r_cell.i2h_weight']
-    # assert outputs.list_outputs() == \
-    #        ['bi_t0_plus_residual_output', 'bi_t1_plus_residual_output']
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=(10, 50), 
rnn_t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'rnn_t0_data':mx.nd.ones((10, 50))+5,
-                              'rnn_t1_data':mx.nd.ones((10, 50))+5,
-                              
cell.base_cell.l_cell.i2h_weight.var().name:mx.nd.zeros((75, 50)),
-                              
cell.base_cell.l_cell.i2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.l_cell.h2h_weight.var().name:mx.nd.zeros((75, 25)),
-                              
cell.base_cell.l_cell.h2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.r_cell.i2h_weight.var().name:mx.nd.zeros((75, 50)),
-                              
cell.base_cell.r_cell.i2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.r_cell.h2h_weight.var().name:mx.nd.zeros((75, 25)),
-                              
cell.base_cell.r_cell.h2h_bias.var().name:mx.nd.zeros((75,))})
-    expected_outputs = np.ones((10, 50))+5
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
-def test_stack():
-    cell = gluon.rnn.SequentialRNNCell()
-    for i in range(5):
-        if i == 1:
-            cell.add(gluon.rnn.ResidualCell(gluon.rnn.LSTMCell(100)))
-        else:
-            cell.add(gluon.rnn.LSTMCell(100))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    keys = sorted(cell.collect_params().keys())
-    for i in range(5):
-        if i==1:
-            continue
-        assert '%d.h2h_weight'%i in keys
-        assert '%d.h2h_bias'%i in keys
-        assert '%d.i2h_weight'%i in keys
-        assert '%d.i2h_bias'%i in keys
-    assert '1.base_cell.h2h_weight' in keys
-    assert '1.base_cell.h2h_bias' in keys
-    assert '1.base_cell.i2h_weight' in keys
-    assert '1.base_cell.i2h_bias' in keys
-    assert outputs.list_outputs() == ['lstmcell_t0_out_output', 
'lstmcell_t1_out_output', 'lstmcell_t2_out_output']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_hybridstack():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -32,49 +31,19 @@ def check_rnn_states(fused_states, stack_states, 
num_layers, bidirectional=False
     assert len(stack_states) / len(fused_states) == num_layers * directions
 
     fused_states = [state.asnumpy() for state in fused_states]
-    stack_states = [np.expand_dims(state.asnumpy(), axis=0) for state in 
stack_states]
+    stack_states = [_np.expand_dims(state.asnumpy(), axis=0) for state in 
stack_states]
     if is_lstm:
         stack_states_h = stack_states[0::2]
         stack_states_c = stack_states[1::2]
-        stack_states = [np.concatenate(stack_states_h, axis=0), 
np.concatenate(stack_states_c, axis=0)]
+        stack_states = [_np.concatenate(stack_states_h, axis=0), 
_np.concatenate(stack_states_c, axis=0)]
     else:
-        stack_states = [np.concatenate(stack_states, axis=0)]
+        stack_states = [_np.concatenate(stack_states, axis=0)]
 
     for f, s in zip(fused_states, stack_states):
         assert f.shape == s.shape
         assert_almost_equal(f, s, atol=1e-4, rtol=1e-4)
 
 
-def test_rnn():
-    cell = gluon.rnn.RNNCell(100)
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [
-        'rnncell_t0_out_output', 'rnncell_t1_out_output',
-        'rnncell_t2_out_output'
-    ]
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
-def test_lstm():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -890,152 +657,53 @@ def __init__(self, rnn_size, time_step, **kwargs):
                     gluon.rnn.LSTMCell(rnn_size),
                     gluon.rnn.LSTMCell(rnn_size))
 
-            def hybrid_forward(self, F, inputs, valid_len):
+            def forward(self, inputs, valid_len):
                 outputs, states = self.bi_lstm.unroll(self.time_step, inputs, 
valid_length=valid_len,
                                                       layout='NTC', 
merge_outputs=True)
                 return outputs, states
+            
+            def infer_shape(self, x, *args):
+                self.bi_lstm.infer_shape(0, x.shape[x.ndim-1], True)
 
         rnn_size = 100
         net = BiLSTM(rnn_size, length)
+        inputs_data = mx.np.random.uniform(size=(10, length, 50))
+        net.infer_shape(inputs_data)
         net.initialize()
         net.hybridize()
-        inputs_data = mx.nd.random.uniform(shape=(10, length, 50))
-        valid_len = mx.nd.array([length]*10)
+        valid_len = mx.np.array([length]*10)
         outputs, _ = net(inputs_data, valid_len)
         assert outputs.shape == (10, length, 200)
 
     _check_bidirectional_unroll_valid_length(1)
     _check_bidirectional_unroll_valid_length(3)
 
 
-def check_rnn_cell(cell, in_shape=(10, 50), out_shape=(10, 100), 
begin_state=None):
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs, begin_state=begin_state)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [type(cell).__name__.lower() + name for 
name in ['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=in_shape,
-                                           rnn_t1_data=in_shape,
-                                           rnn_t2_data=in_shape)
-    assert outs == [out_shape] * 3
-
-
 def check_rnn_forward(layer, inputs):
     inputs.attach_grad()
     layer.initialize()
     with mx.autograd.record():
         layer.unroll(3, inputs, merge_outputs=True)[0].backward()
         mx.autograd.backward(layer.unroll(3, inputs, merge_outputs=False)[0])
-    mx.nd.waitall()
+mx.npx.waitall()
 
 
 def test_rnn_cells():
     check_rnn_forward(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DRNNCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DGRUCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
 
     net = mx.gluon.rnn.SequentialRNNCell()
     net.add(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)))
     net.add(gluon.rnn.Conv1DRNNCell((10, 5), 11, (3,), (3,)))
     net.add(gluon.rnn.Conv1DGRUCell((11, 3), 12, (3,), (3,)))
-    check_rnn_forward(net, mx.nd.ones((8, 3, 5, 7)))
-
-
-def test_convrnn():
-    cell = gluon.rnn.Conv1DRNNCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DRNNCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DRNNCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convlstm():
-    cell = gluon.rnn.Conv1DLSTMCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DLSTMCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DLSTMCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convgru():
-    cell = gluon.rnn.Conv1DGRUCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DGRUCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DGRUCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_conv_fill_shape():
-    cell = gluon.rnn.Conv1DLSTMCell((0, 7), 10, (3,), (3,))
-    cell.hybridize()
-    check_rnn_forward(cell, mx.nd.ones((8, 3, 5, 7)))
-    assert cell.i2h_weight.shape[1] == 5, cell.i2h_weight.shape[1]
-
-
-def test_lstmp():
-    nhid = 100
-    nproj = 64
-    cell = gluon.rnn.LSTMPCell(nhid, nproj)
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    expected_params = ['h2h_bias', 'h2h_weight', 'h2r_weight', 'i2h_bias', 
'i2h_weight']
-    expected_outputs = [type(cell).__name__.lower() + name for name in 
['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-    assert sorted(cell.collect_params().keys()) == expected_params
-    assert outputs.list_outputs() == expected_outputs, outputs.list_outputs()
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=(10,50), 
rnn_t1_data=(10,50), rnn_t2_data=(10,50))
-    assert outs == [(10, nproj), (10, nproj), (10, nproj)]
-
-
-def test_vardrop():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead? Also, the correctness test should not be dropped.

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -371,22 +160,10 @@ def forward(self, inpt):
         net_params[k].set_data(weights[k])
         ref_net_params[k.replace('l0', '_lstm_fwd.l0').replace('r0', 
'_lstm_bwd.l0')].set_data(weights[k])
 
-    data = mx.random.uniform(shape=(11, 10, in_size))
+    data = mx.np.random.uniform(size=(11, 10, in_size))
     assert_allclose(net(data).asnumpy(), ref_net(data).asnumpy(), rtol=1e-04, 
atol=1e-02)
 
 
-
-def test_zoneout():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -890,152 +657,53 @@ def __init__(self, rnn_size, time_step, **kwargs):
                     gluon.rnn.LSTMCell(rnn_size),
                     gluon.rnn.LSTMCell(rnn_size))
 
-            def hybrid_forward(self, F, inputs, valid_len):
+            def forward(self, inputs, valid_len):
                 outputs, states = self.bi_lstm.unroll(self.time_step, inputs, 
valid_length=valid_len,
                                                       layout='NTC', 
merge_outputs=True)
                 return outputs, states
+            
+            def infer_shape(self, x, *args):
+                self.bi_lstm.infer_shape(0, x.shape[x.ndim-1], True)
 
         rnn_size = 100
         net = BiLSTM(rnn_size, length)
+        inputs_data = mx.np.random.uniform(size=(10, length, 50))
+        net.infer_shape(inputs_data)
         net.initialize()
         net.hybridize()
-        inputs_data = mx.nd.random.uniform(shape=(10, length, 50))
-        valid_len = mx.nd.array([length]*10)
+        valid_len = mx.np.array([length]*10)
         outputs, _ = net(inputs_data, valid_len)
         assert outputs.shape == (10, length, 200)
 
     _check_bidirectional_unroll_valid_length(1)
     _check_bidirectional_unroll_valid_length(3)
 
 
-def check_rnn_cell(cell, in_shape=(10, 50), out_shape=(10, 100), 
begin_state=None):
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs, begin_state=begin_state)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [type(cell).__name__.lower() + name for 
name in ['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=in_shape,
-                                           rnn_t1_data=in_shape,
-                                           rnn_t2_data=in_shape)
-    assert outs == [out_shape] * 3
-
-
 def check_rnn_forward(layer, inputs):
     inputs.attach_grad()
     layer.initialize()
     with mx.autograd.record():
         layer.unroll(3, inputs, merge_outputs=True)[0].backward()
         mx.autograd.backward(layer.unroll(3, inputs, merge_outputs=False)[0])
-    mx.nd.waitall()
+mx.npx.waitall()
 
 
 def test_rnn_cells():
     check_rnn_forward(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DRNNCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DGRUCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
 
     net = mx.gluon.rnn.SequentialRNNCell()
     net.add(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)))
     net.add(gluon.rnn.Conv1DRNNCell((10, 5), 11, (3,), (3,)))
     net.add(gluon.rnn.Conv1DGRUCell((11, 3), 12, (3,), (3,)))
-    check_rnn_forward(net, mx.nd.ones((8, 3, 5, 7)))
-
-
-def test_convrnn():
-    cell = gluon.rnn.Conv1DRNNCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DRNNCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DRNNCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convlstm():
-    cell = gluon.rnn.Conv1DLSTMCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DLSTMCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DLSTMCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convgru():
-    cell = gluon.rnn.Conv1DGRUCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DGRUCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DGRUCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_conv_fill_shape():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -890,152 +657,53 @@ def __init__(self, rnn_size, time_step, **kwargs):
                     gluon.rnn.LSTMCell(rnn_size),
                     gluon.rnn.LSTMCell(rnn_size))
 
-            def hybrid_forward(self, F, inputs, valid_len):
+            def forward(self, inputs, valid_len):
                 outputs, states = self.bi_lstm.unroll(self.time_step, inputs, 
valid_length=valid_len,
                                                       layout='NTC', 
merge_outputs=True)
                 return outputs, states
+            
+            def infer_shape(self, x, *args):
+                self.bi_lstm.infer_shape(0, x.shape[x.ndim-1], True)
 
         rnn_size = 100
         net = BiLSTM(rnn_size, length)
+        inputs_data = mx.np.random.uniform(size=(10, length, 50))
+        net.infer_shape(inputs_data)
         net.initialize()
         net.hybridize()
-        inputs_data = mx.nd.random.uniform(shape=(10, length, 50))
-        valid_len = mx.nd.array([length]*10)
+        valid_len = mx.np.array([length]*10)
         outputs, _ = net(inputs_data, valid_len)
         assert outputs.shape == (10, length, 200)
 
     _check_bidirectional_unroll_valid_length(1)
     _check_bidirectional_unroll_valid_length(3)
 
 
-def check_rnn_cell(cell, in_shape=(10, 50), out_shape=(10, 100), 
begin_state=None):
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs, begin_state=begin_state)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [type(cell).__name__.lower() + name for 
name in ['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=in_shape,
-                                           rnn_t1_data=in_shape,
-                                           rnn_t2_data=in_shape)
-    assert outs == [out_shape] * 3
-
-
 def check_rnn_forward(layer, inputs):
     inputs.attach_grad()
     layer.initialize()
     with mx.autograd.record():
         layer.unroll(3, inputs, merge_outputs=True)[0].backward()
         mx.autograd.backward(layer.unroll(3, inputs, merge_outputs=False)[0])
-    mx.nd.waitall()
+mx.npx.waitall()
 
 
 def test_rnn_cells():
     check_rnn_forward(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DRNNCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DGRUCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
 
     net = mx.gluon.rnn.SequentialRNNCell()
     net.add(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)))
     net.add(gluon.rnn.Conv1DRNNCell((10, 5), 11, (3,), (3,)))
     net.add(gluon.rnn.Conv1DGRUCell((11, 3), 12, (3,), (3,)))
-    check_rnn_forward(net, mx.nd.ones((8, 3, 5, 7)))
-
-
-def test_convrnn():
-    cell = gluon.rnn.Conv1DRNNCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DRNNCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DRNNCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convlstm():
-    cell = gluon.rnn.Conv1DLSTMCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DLSTMCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DLSTMCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convgru():
-    cell = gluon.rnn.Conv1DGRUCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DGRUCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DGRUCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_conv_fill_shape():
-    cell = gluon.rnn.Conv1DLSTMCell((0, 7), 10, (3,), (3,))
-    cell.hybridize()
-    check_rnn_forward(cell, mx.nd.ones((8, 3, 5, 7)))
-    assert cell.i2h_weight.shape[1] == 5, cell.i2h_weight.shape[1]
-
-
-def test_lstmp():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -152,190 +124,7 @@ def test_lstmp():
         check_rnn_states(fused_states, stack_states, num_layers, True)
 
 
-@assert_raises_cudnn_not_satisfied(min_version='5.1.10')
-def test_lstm_cpu_inference():
-    # should behave the same as lstm cell
-    EXPECTED_LSTM_OUTPUT = np.array([[[0.72045636, 0.72045636, 0.95215213, 
0.95215213],
-                                      [0.72045636, 0.72045636, 0.95215213, 
0.95215213]],
-                                     [[0.95215213, 0.95215213, 0.72045636, 
0.72045636],
-                                      [0.95215213, 0.95215213, 0.72045636, 
0.72045636]]])
-    x = mx.nd.ones(shape=(2, 2, 2))
-    model = mx.gluon.rnn.LSTM(2, num_layers=6, bidirectional=True)
-    model.initialize(mx.init.One())
-
-    y = model(x).asnumpy()
-    mx.test_utils.assert_almost_equal(y, EXPECTED_LSTM_OUTPUT,
-                                      rtol=1e-3, atol=1e-5)
-
-
-def test_gru():
-    cell = gluon.rnn.GRUCell(100, activation='relu', 
recurrent_activation='tanh')
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight', 
'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [
-        'grucell_t0_out_output', 'grucell_t1_out_output',
-        'grucell_t2_out_output'
-    ]
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10,50), t1_data=(10,50), 
t2_data=(10,50))
-    assert outs == [(10, 100), (10, 100), (10, 100)]
-
-
[email protected]
-def test_residual():
-    cell = gluon.rnn.ResidualCell(gluon.rnn.GRUCell(50))
-    inputs = [mx.sym.Variable('t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.h2h_bias', 'base_cell.h2h_weight', 
'base_cell.i2h_bias', 'base_cell.i2h_weight']
-
-    args, outs, auxs = outputs.infer_shape(t0_data=(10, 50), t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'t0_data': mx.nd.ones((10, 50)),
-                              't1_data': mx.nd.ones((10, 50)),
-                              cell.base_cell.i2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.i2h_bias.var().name: 
mx.nd.zeros((150, )),
-                              cell.base_cell.h2h_weight.var().name: 
mx.nd.zeros((150, 50)),
-                              cell.base_cell.h2h_bias.var().name: 
mx.nd.zeros((150, ))})
-    expected_outputs = np.ones((10, 50))
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
[email protected]
-def test_residual_bidirectional():
-    cell = gluon.rnn.ResidualCell(
-            gluon.rnn.BidirectionalCell(
-                gluon.rnn.GRUCell(25),
-                gluon.rnn.GRUCell(25)))
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(2)]
-    outputs, _ = cell.unroll(2, inputs, merge_outputs=False)
-    outputs = mx.sym.Group(outputs)
-    params = cell.collect_params()
-    assert sorted(params.keys()) == \
-           ['base_cell.l_cell.h2h_bias', 'base_cell.l_cell.h2h_weight',
-            'base_cell.l_cell.i2h_bias', 'base_cell.l_cell.i2h_weight',
-            'base_cell.r_cell.h2h_bias', 'base_cell.r_cell.h2h_weight',
-            'base_cell.r_cell.i2h_bias', 'base_cell.r_cell.i2h_weight']
-    # assert outputs.list_outputs() == \
-    #        ['bi_t0_plus_residual_output', 'bi_t1_plus_residual_output']
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=(10, 50), 
rnn_t1_data=(10, 50))
-    assert outs == [(10, 50), (10, 50)]
-    outputs = outputs.eval(**{'rnn_t0_data':mx.nd.ones((10, 50))+5,
-                              'rnn_t1_data':mx.nd.ones((10, 50))+5,
-                              
cell.base_cell.l_cell.i2h_weight.var().name:mx.nd.zeros((75, 50)),
-                              
cell.base_cell.l_cell.i2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.l_cell.h2h_weight.var().name:mx.nd.zeros((75, 25)),
-                              
cell.base_cell.l_cell.h2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.r_cell.i2h_weight.var().name:mx.nd.zeros((75, 50)),
-                              
cell.base_cell.r_cell.i2h_bias.var().name:mx.nd.zeros((75,)),
-                              
cell.base_cell.r_cell.h2h_weight.var().name:mx.nd.zeros((75, 25)),
-                              
cell.base_cell.r_cell.h2h_bias.var().name:mx.nd.zeros((75,))})
-    expected_outputs = np.ones((10, 50))+5
-    assert np.array_equal(outputs[0].asnumpy(), expected_outputs)
-    assert np.array_equal(outputs[1].asnumpy(), expected_outputs)
-
-
-def test_stack():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_numpy_loss.py
##########
@@ -16,7 +16,7 @@
 # under the License.
 
 import mxnet as mx
-import numpy as np
+import numpy as _np

Review comment:
       use `onp` as the name for consistency

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -890,152 +657,53 @@ def __init__(self, rnn_size, time_step, **kwargs):
                     gluon.rnn.LSTMCell(rnn_size),
                     gluon.rnn.LSTMCell(rnn_size))
 
-            def hybrid_forward(self, F, inputs, valid_len):
+            def forward(self, inputs, valid_len):
                 outputs, states = self.bi_lstm.unroll(self.time_step, inputs, 
valid_length=valid_len,
                                                       layout='NTC', 
merge_outputs=True)
                 return outputs, states
+            
+            def infer_shape(self, x, *args):
+                self.bi_lstm.infer_shape(0, x.shape[x.ndim-1], True)
 
         rnn_size = 100
         net = BiLSTM(rnn_size, length)
+        inputs_data = mx.np.random.uniform(size=(10, length, 50))
+        net.infer_shape(inputs_data)
         net.initialize()
         net.hybridize()
-        inputs_data = mx.nd.random.uniform(shape=(10, length, 50))
-        valid_len = mx.nd.array([length]*10)
+        valid_len = mx.np.array([length]*10)
         outputs, _ = net(inputs_data, valid_len)
         assert outputs.shape == (10, length, 200)
 
     _check_bidirectional_unroll_valid_length(1)
     _check_bidirectional_unroll_valid_length(3)
 
 
-def check_rnn_cell(cell, in_shape=(10, 50), out_shape=(10, 100), 
begin_state=None):
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs, begin_state=begin_state)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [type(cell).__name__.lower() + name for 
name in ['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=in_shape,
-                                           rnn_t1_data=in_shape,
-                                           rnn_t2_data=in_shape)
-    assert outs == [out_shape] * 3
-
-
 def check_rnn_forward(layer, inputs):
     inputs.attach_grad()
     layer.initialize()
     with mx.autograd.record():
         layer.unroll(3, inputs, merge_outputs=True)[0].backward()
         mx.autograd.backward(layer.unroll(3, inputs, merge_outputs=False)[0])
-    mx.nd.waitall()
+mx.npx.waitall()
 
 
 def test_rnn_cells():
     check_rnn_forward(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DRNNCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DGRUCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
 
     net = mx.gluon.rnn.SequentialRNNCell()
     net.add(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)))
     net.add(gluon.rnn.Conv1DRNNCell((10, 5), 11, (3,), (3,)))
     net.add(gluon.rnn.Conv1DGRUCell((11, 3), 12, (3,), (3,)))
-    check_rnn_forward(net, mx.nd.ones((8, 3, 5, 7)))
-
-
-def test_convrnn():
-    cell = gluon.rnn.Conv1DRNNCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DRNNCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DRNNCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convlstm():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_gluon_data_vision.py
##########
@@ -1,433 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       The Gluon vision data pipeline use case seems still relevant. Why is the 
test removed?

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -890,152 +657,53 @@ def __init__(self, rnn_size, time_step, **kwargs):
                     gluon.rnn.LSTMCell(rnn_size),
                     gluon.rnn.LSTMCell(rnn_size))
 
-            def hybrid_forward(self, F, inputs, valid_len):
+            def forward(self, inputs, valid_len):
                 outputs, states = self.bi_lstm.unroll(self.time_step, inputs, 
valid_length=valid_len,
                                                       layout='NTC', 
merge_outputs=True)
                 return outputs, states
+            
+            def infer_shape(self, x, *args):
+                self.bi_lstm.infer_shape(0, x.shape[x.ndim-1], True)
 
         rnn_size = 100
         net = BiLSTM(rnn_size, length)
+        inputs_data = mx.np.random.uniform(size=(10, length, 50))
+        net.infer_shape(inputs_data)
         net.initialize()
         net.hybridize()
-        inputs_data = mx.nd.random.uniform(shape=(10, length, 50))
-        valid_len = mx.nd.array([length]*10)
+        valid_len = mx.np.array([length]*10)
         outputs, _ = net(inputs_data, valid_len)
         assert outputs.shape == (10, length, 200)
 
     _check_bidirectional_unroll_valid_length(1)
     _check_bidirectional_unroll_valid_length(3)
 
 
-def check_rnn_cell(cell, in_shape=(10, 50), out_shape=(10, 100), 
begin_state=None):
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs, begin_state=begin_state)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [type(cell).__name__.lower() + name for 
name in ['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=in_shape,
-                                           rnn_t1_data=in_shape,
-                                           rnn_t2_data=in_shape)
-    assert outs == [out_shape] * 3
-
-
 def check_rnn_forward(layer, inputs):
     inputs.attach_grad()
     layer.initialize()
     with mx.autograd.record():
         layer.unroll(3, inputs, merge_outputs=True)[0].backward()
         mx.autograd.backward(layer.unroll(3, inputs, merge_outputs=False)[0])
-    mx.nd.waitall()
+mx.npx.waitall()
 
 
 def test_rnn_cells():
     check_rnn_forward(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DRNNCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DGRUCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
 
     net = mx.gluon.rnn.SequentialRNNCell()
     net.add(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)))
     net.add(gluon.rnn.Conv1DRNNCell((10, 5), 11, (3,), (3,)))
     net.add(gluon.rnn.Conv1DGRUCell((11, 3), 12, (3,), (3,)))
-    check_rnn_forward(net, mx.nd.ones((8, 3, 5, 7)))
-
-
-def test_convrnn():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: tests/python/unittest/test_numpy_ndarray.py
##########
@@ -514,28 +514,23 @@ def check_binary_op_result(shape1, shape2, op, 
dtype=None):
 def test_np_hybrid_block_multiple_outputs():
     @use_np
     class TestAllNumpyOutputs(HybridBlock):
-        def hybrid_forward(self, F, x, *args, **kwargs):
-            return F.np.add(x, x), F.np.multiply(x, x)
-
-    class TestAllClassicOutputs(HybridBlock):
-        def hybrid_forward(self, F, x, *args, **kwargs):
-            return x.as_nd_ndarray() + x.as_nd_ndarray(), x.as_nd_ndarray() * 
x.as_nd_ndarray()
+        def forward(self, x, *args, **kwargs):
+            return np.add(x, x), np.multiply(x, x)
 
     data_np = np.ones((2, 3))
-    for block, expected_out_type in [(TestAllClassicOutputs, mx.nd.NDArray),
-                                     (TestAllNumpyOutputs, np.ndarray)]:
-        net = block()
-        for hybridize in [True, False]:
-            if hybridize:
-                net.hybridize()
-            out1, out2 = net(data_np)
-            assert type(out1) is expected_out_type
-            assert type(out2) is expected_out_type
+    block, expected_out_type = TestAllNumpyOutputs, np.ndarray
+    net = block()
+    for hybridize in [True, False]:
+        if hybridize:
+            net.hybridize()

Review comment:
       nit: `net.hybridize(active=hybridize)`

##########
File path: python/mxnet/numpy_extension/control_flow.py
##########
@@ -0,0 +1,390 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""Namespace for registering numpy_extension ops for imperative programming."""
+
+from ..util import set_module
+from ..ndarray import NDArray
+from ..base import _as_list
+from .. import numpy as _mx_np
+
+
+__all__ = ["foreach", "while_loop", "cond"]
+
+
+def _flatten(args, inout_str):
+    if isinstance(args, NDArray):
+        return [args], int(0)
+
+    assert isinstance(args, (list, tuple)), \
+        "%s must be (nested) list of NDArray, " \
+        "but got %s of type %s"%(inout_str, str(args), str(type(args)))
+    flat = []
+    fmts = []
+    for i in args:
+        arg, fmt = _flatten(i, inout_str)
+        flat.extend(arg)
+        fmts.append(fmt)
+    return flat, fmts
+
+
+def _regroup(args, fmt):
+    if isinstance(fmt, int):
+        if fmt == 0:
+            return args[0], args[1:]
+        return args[:fmt], args[fmt:]
+
+    assert isinstance(args, (list, tuple)), \
+        "output must be (nested) list of NDArray, " \
+        "but got %s of type %s"%(str(args), str(type(args)))
+    ret = []
+    for i in fmt:
+        res, args = _regroup(args, i)
+        ret.append(res)
+    return ret, args
+
+
+@set_module('mxnet.numpy_extension')
+def foreach(body, data, init_states):
+    """Run a for loop with user-defined computation over NDArrays on dimension 
0.
+
+    This operator simulates a for loop and body has the computation for an 
iteration
+    of the for loop. It runs the computation in body on each slice from the 
input
+    NDArrays.
+
+    body takes two arguments as input and outputs a tuple of two elements,
+    as illustrated below::
+
+        out, states = body(data1, states)
+
+    data1 can be either an NDArray or a list of NDArrays. If data is an 
NDArray,
+    data1 is an NDArray. Otherwise, data1 is a list of NDArrays and has the 
same
+    size as data. states is a list of NDArrays and have the same size as 
init_states.
+    Similarly, out can be either an NDArray or a list of NDArrays, which are 
concatenated
+    as the first output of foreach; states from the last execution of body
+    are the second output of foreach.
+
+    The computation done by this operator is equivalent to the pseudo code 
below
+    when the input data is NDArray::
+
+        states = init_states
+        outs = []
+        for i in data.shape[0]:
+            s = data[i]
+            out, states = body(s, states)
+            outs.append(out)
+        outs = stack(*outs)
+
+
+    Parameters
+    ----------
+    body : a Python function.
+        Define computation in an iteration.
+    data: an NDArray or a list of NDArrays.
+        The input data.
+    init_states: an NDArray or nested lists of NDArrays.
+        The initial values of the loop states.
+
+    Returns
+    -------
+    outputs: an NDArray or nested lists of NDArrays.
+        The output data concatenated from the output of all iterations.
+    states: an NDArray or nested lists of NDArrays.
+        The loop states in the last iteration.
+
+    Examples
+    --------
+    >>> step = lambda data, states: (data + states[0], [states[0] * 2])
+    >>> data = mx.np.random.uniform(size=(2, 10))
+    >>> states = [mx.np.random.uniform(size=(10))]
+    >>> outs, states = npx.control_flow.foreach(step, data, states)
+    """
+
+    def check_input(inputs, in_type, msg):
+        is_NDArray_or_list = True
+        if isinstance(inputs, list):
+            for i in inputs:
+                if not isinstance(i, in_type):
+                    is_NDArray_or_list = False
+                    break
+        else:
+            is_NDArray_or_list = isinstance(inputs, in_type)
+        assert is_NDArray_or_list, msg
+
+    flatten, _ = _flatten(data, "foreach input")
+    check_input(flatten, NDArray,
+                "data should be an NDArray or a nested list of NDArrays")
+    flatten, _ = _flatten(init_states, "foreach states")
+    check_input(flatten, NDArray,
+                "init_states should be an NDArray or a nested list of 
NDArrays")
+
+    not_data_list = isinstance(data, NDArray)
+    num_iters = data.shape[0] if not_data_list else data[0].shape[0]
+    states = init_states
+    outputs = []
+    for i in range(num_iters):
+        if not_data_list:
+            eles = data[i]
+        else:
+            eles = [d[i] for d in data]
+        outs, states = body(eles, states)
+        outs, out_fmt = _flatten(outs, "foreach output")
+        outputs.append(outs)
+    outputs = zip(*outputs)
+    tmp_outputs = []
+    for out in outputs:
+        tmp_outputs.append(_mx_np.stack(out))
+    outputs = tmp_outputs
+    outputs, _ = _regroup(outputs, out_fmt)
+
+    return (outputs, states)
+
+
+#pylint: disable=W0621
+@set_module('mxnet.numpy_extension')
+def while_loop(cond, func, loop_vars, max_iterations=None):
+    """Run a while loop with user-defined computation and loop condition.
+
+    This operator simulates a while loop which iterately does customized 
computation
+    as long as the condition is satisfied.
+
+    `loop_vars` is a list of NDArrays on which the computation uses.
+
+    `cond` is a user-defined function, used as the loop condition.
+    It consumes `loop_vars`, and produces a scalar MXNet NDArray,
+    indicating the termination of the loop.
+    The loop ends when `cond` returns false (zero).
+    The `cond` is variadic, and its signature should be
+    `cond(*loop_vars) => NDArray`.
+
+    `func` is a user-defined function, used as the loop body.
+    It also consumes `loop_vars`, and produces `step_output` and 
`new_loop_vars` at each step.
+    In each step, `step_output` should contain the same number elements.
+    Through all steps, the i-th element of `step_output` should have the same 
shape and dtype.
+    Also, `new_loop_vars` should contain the same number of elements as 
`loop_vars`,
+    and the corresponding element should have the same shape and dtype.
+    The `func` is variadic, and its signature should be
+    `func(*loop_vars) =>
+    (NDArray or nested List[NDArray] step_output, NDArray or nested 
List[NDArray] new_loop_vars)`.
+
+    `max_iterations` is a scalar that defines the maximum number of iterations 
allowed.
+
+    This function returns two lists.
+    The first list has the length of `|step_output|`,
+    in which the i-th element are all i-th elements of
+    `step_output` from all steps, stacked along axis 0.
+    The second list has the length of `|loop_vars|`,
+    which represents final states of loop variables.
+
+    .. warning::
+
+       For now, the axis 0 of all NDArrays in the first list are 
`max_iterations`,
+       due to lack of dynamic shape inference.
+
+    .. warning::
+
+       When `cond` is never satisfied, we assume `step_output` is empty,
+       because it cannot be inferred. This is different from the symbolic 
version.
+
+    Parameters
+    ----------
+    cond: a Python function.
+        The loop condition.
+    func: a Python function.
+        The loop body.
+    loop_vars: an NDArray or nested lists of NDArrays.
+        The initial values of the loop variables.
+    max_iterations: a python int.
+        Maximum number of iterations.
+
+    Returns
+    ------
+    outputs: an NDArray or nested lists of NDArrays
+        stacked output from each step
+    states: an NDArray or nested lists of NDArrays
+        final state
+
+    Examples
+    --------
+    >>> cond = lambda i, s: i <= 5
+    >>> func = lambda i, s: ([i + s], [i + 1, s + i])
+    >>> loop_vars = (mx.np.array([0], dtype="int64"), mx.np.array([1], 
dtype="int64"))
+    >>> outputs, states = mx.npx.while_loop(cond, func, loop_vars, 
max_iterations=10)
+    >>> outputs
+    [
+    [[ 1]
+    [ 2]
+    [ 4]
+    [ 7]
+    [11]
+    [16]
+    [...]  # undefined value
+    [...]
+    [...]
+    [...]]
+    <NDArray 6x1 @cpu(0)>]
+    >>> states
+    [
+    [6]
+    <NDArray 1 @cpu(0)>,
+    [16]
+    <NDArray 1 @cpu(0)>]
+    """
+    def _to_python_scalar(inputs, type_, name):
+        """Converts "inputs", possibly typed mxnet NDArray, a numpy ndarray, 
other python types,
+        to the given type
+        """
+        if isinstance(inputs, NDArray):
+            inputs = inputs.item()
+        try:
+            inputs = type_(inputs)
+        except:
+            raise ValueError("Cannot convert %s to python %s" % (name, 
type_.__name__))
+        return inputs
+
+    def _func_wrapper(loop_vars):
+        """This wrapper unifies
+             "func: loop_vars -> new_loop_vars"
+         and "func: loop_vars -> (step_output, new_loop_vars)"
+        into "func: loop_vars -> (None or tuple of step_outputs, tuple of 
new_loop_vars)
+        """
+        step_output, new_loop_vars = func(*loop_vars)
+        if step_output is None:
+            step_output = []
+        if new_loop_vars is None:
+            new_loop_vars = []
+        if isinstance(step_output, tuple):
+            step_output = list(step_output)
+        if isinstance(new_loop_vars, tuple):
+            new_loop_vars = list(new_loop_vars)
+        new_loop_vars = _as_list(new_loop_vars)
+        if len(loop_vars) != len(new_loop_vars):
+            raise ValueError("The length of loop_vars should be consistent 
during the loop")
+        return step_output, new_loop_vars
+
+    if max_iterations is None:
+        raise ValueError("max_iterations should be specified")
+    max_iterations = _to_python_scalar(max_iterations, int, "max_iteration")
+    # It should be work as fine if loop_vars are empty I guess,
+    # but it is semantically unnecessary to include this case.
+    if len(loop_vars) == 0:
+        raise ValueError("loop_vars should contain at least one element")
+
+    steps = 0
+    outputs = []
+    # there might not be an iteration.
+    out_fmt = None
+    not_loop_var_list = isinstance(loop_vars, NDArray)
+    loop_vars = _as_list(loop_vars)
+    while steps < max_iterations and \
+            _to_python_scalar(cond(*loop_vars), bool, "Return value of cond"): 
# loop condition
+        step_output, loop_vars = _func_wrapper(loop_vars)
+        step_output, out_fmt = _flatten(step_output, "while output")
+        outputs.append(step_output)
+        steps += 1
+        if len(outputs) != steps or len(step_output) != len(outputs[0]):
+            raise ValueError("Number of elements in step_output should be the 
same in each step")
+    stacked_outputs = []
+    for i_th, items in enumerate(zip(*outputs), 1):
+        # `mx.ndarray.pad` only support 4-D or 5-D inputs for now
+        # so we could not use it.
+        items = [_mx_np.expand_dims(x, 0) for x in items]
+        try:
+            concate_outputs = _mx_np.concatenate(items, axis=0)
+            print(concate_outputs.shape)
+            if steps != max_iterations and items:
+                to_pad = max_iterations - steps
+                concate_outputs = _mx_np.pad(concate_outputs, pad_width=((0, 
to_pad), (0, 0)))
+            stacked_outputs.append(concate_outputs)
+        except ValueError:
+            raise ValueError("\n".join(
+                ["Shapes of %d-th elements in step_outputs are inconsistent, 
which are:" % i_th] +
+                ["  Step %d, shape is %s" % (i, str(x.shape)) for i, x in 
enumerate(items)]
+            ))
+    if out_fmt is not None:
+        stacked_outputs, _ = _regroup(stacked_outputs, out_fmt)
+    if not_loop_var_list:
+        loop_vars = loop_vars[0]
+    return stacked_outputs, loop_vars
+
+
+@set_module('mxnet.numpy_extension')
+def cond(pred, then_func, else_func):
+    """Run an if-then-else using user-defined condition and computation
+
+    This operator simulates a if-like branch which chooses to do one of
+    the two customized computations according to the specified condition.
+
+    `pred` is a scalar MXNet NDArray,
+    indicating which branch of computation should be used.
+
+    `then_func` is a user-defined function, used as computation of the then 
branch.
+    It produces `outputs`, which is a list of NDArrays.
+    The signature of `then_func` should be
+    `then_func() => NDArray or nested List[NDArray]`.
+
+    `else_func` is a user-defined function, used as computation of the else 
branch.
+    It produces `outputs`, which is a list of NDArrays.
+    The signature of `else_func` should be
+    `else_func() => NDArray or nested List[NDArray]`.
+
+    The `outputs` produces by `then_func` and `else_func` should have the same 
number
+    of elements, all of which should be in the same shape, of the same dtype 
and stype.
+
+    This function returns a list of symbols, representing the computation 
result.
+
+    Parameters
+    ----------
+    pred: a MXNet NDArray representing a scalar.
+        The branch condition.
+    then_func: a Python function.
+        The computation to be executed if `pred` is true.
+    else_func: a Python function.
+        The computation to be executed if `pred` is false.
+
+    Returns
+    -------
+    outputs: an NDArray or nested lists of NDArrays, representing the result 
of computation.
+
+    Examples
+    --------
+    >>> a, b = mx.nd.array([1]), mx.nd.array([2])
+    >>> pred = a * b < 5
+    >>> then_func = lambda: (a + 5) * (b + 5)
+    >>> else_func = lambda: (a - 5) * (b - 5)
+    >>> outputs = mx.nd.contrib.cond(pred, then_func, else_func)
+    >>> outputs[0]
+    [42.]
+    <NDArray 1 @cpu(0)>
+    """
+    def _to_python_scalar(inputs, type_, name):
+        """Converts "inputs", possibly typed mxnet NDArray, a numpy ndarray, 
other python types,
+        to the given type
+        """
+        if hasattr(inputs, "asscalar"):
+            inputs = inputs.item()
+        try:
+            inputs = type_(inputs)
+        except:
+            raise ValueError("Cannot convert %s to python %s" % (name, 
type_.__name__))
+        return inputs
+
+    branch = _to_python_scalar(pred, bool, "pred")
+    if branch:
+        return then_func()
+    else:
+        return else_func()

Review comment:
       please add tests for control flow operators in the new namespace for 
both hybridized and non-hybridized versions.

##########
File path: tests/python/unittest/test_gluon_rnn.py
##########
@@ -890,152 +657,53 @@ def __init__(self, rnn_size, time_step, **kwargs):
                     gluon.rnn.LSTMCell(rnn_size),
                     gluon.rnn.LSTMCell(rnn_size))
 
-            def hybrid_forward(self, F, inputs, valid_len):
+            def forward(self, inputs, valid_len):
                 outputs, states = self.bi_lstm.unroll(self.time_step, inputs, 
valid_length=valid_len,
                                                       layout='NTC', 
merge_outputs=True)
                 return outputs, states
+            
+            def infer_shape(self, x, *args):
+                self.bi_lstm.infer_shape(0, x.shape[x.ndim-1], True)
 
         rnn_size = 100
         net = BiLSTM(rnn_size, length)
+        inputs_data = mx.np.random.uniform(size=(10, length, 50))
+        net.infer_shape(inputs_data)
         net.initialize()
         net.hybridize()
-        inputs_data = mx.nd.random.uniform(shape=(10, length, 50))
-        valid_len = mx.nd.array([length]*10)
+        valid_len = mx.np.array([length]*10)
         outputs, _ = net(inputs_data, valid_len)
         assert outputs.shape == (10, length, 200)
 
     _check_bidirectional_unroll_valid_length(1)
     _check_bidirectional_unroll_valid_length(3)
 
 
-def check_rnn_cell(cell, in_shape=(10, 50), out_shape=(10, 100), 
begin_state=None):
-    inputs = [mx.sym.Variable('rnn_t%d_data'%i) for i in range(3)]
-    outputs, _ = cell.unroll(3, inputs, begin_state=begin_state)
-    outputs = mx.sym.Group(outputs)
-    assert sorted(cell.collect_params().keys()) == ['h2h_bias', 'h2h_weight',
-                                                    'i2h_bias', 'i2h_weight']
-    assert outputs.list_outputs() == [type(cell).__name__.lower() + name for 
name in ['_t0_out_output', '_t1_out_output', '_t2_out_output']]
-
-    args, outs, auxs = outputs.infer_shape(rnn_t0_data=in_shape,
-                                           rnn_t1_data=in_shape,
-                                           rnn_t2_data=in_shape)
-    assert outs == [out_shape] * 3
-
-
 def check_rnn_forward(layer, inputs):
     inputs.attach_grad()
     layer.initialize()
     with mx.autograd.record():
         layer.unroll(3, inputs, merge_outputs=True)[0].backward()
         mx.autograd.backward(layer.unroll(3, inputs, merge_outputs=False)[0])
-    mx.nd.waitall()
+mx.npx.waitall()
 
 
 def test_rnn_cells():
     check_rnn_forward(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DRNNCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
     check_rnn_forward(gluon.rnn.Conv1DGRUCell((5, 7), 10, (3,), (3,)),
-                      mx.nd.ones((8, 3, 5, 7)))
+                      mx.np.ones((8, 3, 5, 7)))
 
     net = mx.gluon.rnn.SequentialRNNCell()
     net.add(gluon.rnn.Conv1DLSTMCell((5, 7), 10, (3,), (3,)))
     net.add(gluon.rnn.Conv1DRNNCell((10, 5), 11, (3,), (3,)))
     net.add(gluon.rnn.Conv1DGRUCell((11, 3), 12, (3,), (3,)))
-    check_rnn_forward(net, mx.nd.ones((8, 3, 5, 7)))
-
-
-def test_convrnn():
-    cell = gluon.rnn.Conv1DRNNCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DRNNCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DRNNCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convlstm():
-    cell = gluon.rnn.Conv1DLSTMCell((10, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 50), out_shape=(1, 100, 48))
-
-    cell = gluon.rnn.Conv2DLSTMCell((10, 20, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 50), out_shape=(1, 100, 18, 48))
-
-    cell = gluon.rnn.Conv3DLSTMCell((10, 20, 30, 50), 100, 3, 3)
-    check_rnn_cell(cell, in_shape=(1, 10, 20, 30, 50), out_shape=(1, 100, 18, 
28, 48))
-
-
-def test_convgru():

Review comment:
       Should this test be adapted to testing the front-end infer_shape 
functions instead?

##########
File path: python/mxnet/ndarray/numpy_extension/_op.py
##########
@@ -397,12 +397,15 @@ def fully_connected(x, weight, bias=None, num_hidden=None,
         The output of this function.
     """
     assert num_hidden is not None, "Please provide number of hidden nodes"
+    if bias is not None:
+        return _api_internal.fully_connected(x, weight, bias, num_hidden,
+                                             False, flatten)
     if no_bias:
-        return _api_internal.fully_connected(x, weight, num_hidden, no_bias, 
flatten)
+        return _api_internal.fully_connected(x, weight, num_hidden, True, 
flatten)
     else:
         assert bias is not None, "Missing bias parameter"
         return _api_internal.fully_connected(x, weight, bias, num_hidden,
-                                             no_bias, flatten)
+                                             False, flatten)

Review comment:
       The original implementation appears correct.

##########
File path: docs/python_docs/python/tutorials/packages/gluon/loss/custom-loss.md
##########
@@ -45,7 +45,7 @@ import random
 
 The loss function uses a margin *m* which is has the effect that dissimlar 
pairs only contribute if their loss is within a certain margin.
 
-In order to implement such a customized loss function in Gluon, we only need 
to define a new class that is inheriting from the 
[Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.Loss) base class. 
We then define the contrastive loss logic in the 
[hybrid_forward](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock.hybrid_forward)
 method. This method takes the images `image1`, `image2` and the label which 
defines whether  `image1` and `image2` are similar (=0) or  dissimilar (=1). 
The input F is an `mxnet.ndarry` or an `mxnet.symbol` if we hybridize the 
network. Gluon's `Loss` base class is in fact a 
[HybridBlock](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock). 
This means we can either run  imperatively or symbolically. When we hybridize 
our custom loss function, we can get performance speedups.
+In order to implement such a customized loss function in Gluon, we only need 
to define a new class that is inheriting from the 
[Loss](../../../../api/gluon/loss/index.rst#mxnet.gluon.loss.Loss) base class. 
We then define the contrastive loss logic in the 
[forward](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock.forward)
 method. This method takes the images `image1`, `image2` and the label which 
defines whether  `image1` and `image2` are similar (=0) or  dissimilar (=1). 
The input F is an `mxnet.ndarry` or an `mxnet.symbol` if we hybridize the 
network. Gluon's `Loss` base class is in fact a 
[HybridBlock](../../../../api/gluon/hybrid_block.rst#mxnet.gluon.HybridBlock). 
This means we can either run  imperatively or symbolically. When we hybridize 
our custom loss function, we can get performance speedups.

Review comment:
       F is removed in the new forward interface in hybridblock, and so should 
the mentions of F




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] szha commented on a change in pull request #20262: [2.0] Gluon2.0: switch to use forward interface

Reply via email to