reminisce opened a new issue #14253: [RFC] Introducing NumPy-compatible coding 
experience into MXNet
URL: https://github.com/apache/incubator-mxnet/issues/14253
 
 
   ## Motivation
   Today deep learning scientists spend majority of their time on data 
processing, debugging tensor algorithms, and tuning model parameters, instead 
of architecting models from scratch by themselves as a result from the abundant 
pre-trained models existing in many deep learning model zoos. This has 
highlighted the usability of tensor APIs as a key factor for a framework to be 
widely adopted.
   
   MXNet was firstly designed with the focus on memory efficiency, computation 
throughput and scalability. The usability problems begin to show up nowadays 
when more and more models demonstrate dynamic natures, e.g. unknown-shape 
tensors before runtime, control flow depending on a runtime result, etc. Here 
we highlight the most frequent complaints about usability from users.
   - Scalar tensors (aka zero-dim tensors) are not supported. For example, 
given `a = [0, 1, 2]`, `a[1]` will generate an `NDArray` of shape `(1,)`, 
instead of `()` as in NumPy.
   - Zero-size tensor is not supported. For example, a tensor of shape `(0, 16, 
256)` cannot be passed to an operator, because our system currently treats 0, 
the first dimension size, as unknown, rather than a concrete number.
   - Many operators' signatures and functionality are not NumPy compatible, 
e.g. `nd.dot` vs. `np.dot`, `nd.concatenate` vs. `np.concatenate`, etc.
   - Many NumPy operators are missing. See the [reference 
link](https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+numpy+label%3ANumpy)
 to GitHub issues.
   - Operators whose outputs' shapes can only be determined at runtime are not 
supported, e.g. `data[data < 0]` cannot run.
   - Diverged programming experience due to the separation of imperative and 
symbolic operators registered under `mxnet.ndarray` and `mxnet.symbol`.
   - Control flow operators are hard to use. Users have to understand the 
complicated signatures of control flow operators, instead of writing native 
Python code using `for`, `while`, `if/else`, etc.
   For example, we have learned (in a hard way) that it does not make a lot of 
sense to ask users to write code like the following to perform a cumulative sum.
   ```python
   def sum(state, i):
       s = state + data[i]
       return s, [s, i + 1]
   
   def sum_cond(state, i):
       return i < 4
       
   out, state = F.contrib.while_loop(sum_cond, sum, [F.zeros((1)), 
F.zeros((1))],
                                     max_iterations=5)
   ```
   Instead, users should be able to just write native Python code as the 
following and if required, let the framework serialize it into a computation 
graph for optimization and deployment.
   ```python
   data = np.arange(5)
   out = 0
   i = 0
   while i < 5:
       out = out + data[i]
   ```
   
   It is not hard to figure out that all of the above pain points can be 
summarized as a result from lack of NumPy-compatible coding experience in 
MXNet. While addressing the problems of better support of control flow 
operators and a consolidated coding style for writing imperative and symbolic 
code with more flexibility requires introducing fundamental changes into the 
codebase for building new infrastructures, such as a new graph IR and executor, 
which is extremely non-trivial and should be executed with a long-term plan, we 
can, at the moment, improve the usability by fixing the issue of zero-dim/size 
tensors and implementing NumPy operators in MXNet. Please allow us to discuss 
how to achieve these short-term goals in the following.
   
   ## Support of zero-dim and zero-size tensors
   ### What's the problem?
   Zero-dim and zero-size tensors are valid tensors in NumPy. The former, whose 
shapes are `()`, represent scalars in `numpy.ndarray` format. The latter, which 
have one or multiple zero dimension sizes in shapes, can be useful as a 
placeholder for many `ndarray` operations, such as concatenating a zero-size 
`ndarray` with another `ndarray`. MXNet does not support them due to the 
reserved semantics of empty shape `()` and shapes with zero dimension sizes 
indicating unknown shape information. Such information need to be filled out 
during the shape inference stage in order to move forward to tensor 
computations later.
   
   ### How to resolve the problem?
   We can first change the current semantics to comply with NumPy definition.
   1. Change the definition of unknown shapes from `ndim = 0` to `ndim = -1` in 
`TShape` class.
   2. Change the definition of unknown dimension sizes from `dim_size = 0` to 
`dim_size = -1` in `TShape` class.
   
   After this, we need to scan all over the codebase to modify the code 
accordingly where `shape.ndim() == 0` and `shape.Size() == 0` is used to 
perform unknown shape checks.
   
   Please note that although MXNet's shape is a type inheriting from 
`nnvm::Tuple`, which is often used to represent an list-like object, such as 
`axis=(1, 2, 3)`, we will not change the meaning of an empty tuple. This 
separation of definitions for empty shape and empty tuple keeps the their roles 
clearly decoupled.
   
   We propose to breakdown the efforts into the following steps.
   1. Copy `tuple.h` from NNVM to MXNet and rename `nnvm::TShape` to 
`mxnet::TShape`.
   2. Replace all the places in MXNet where `nnvm::Tuple` and `nnvm::TShape` 
are used with `mxnet::Tuple` and `mxnet::TShape`, respectively.
   3. Change the definition of `TShape` in `tuple.h` to use `ndim = -1` to 
indicate unknown shapes and `dim_size = -1` to indicate unknown shape dim sizes.
   4. Modify all the existing shape inference and utility functions where `ndim 
== 0` and `dim_size == 0` is used to accommodate the above changes.
   5. Modify NNVM passes, `InferShape`, `PlanMemory`, and `Gradient`, where 
`nnvm::TShape` is used, to accommodate the above changes.
   6. Add sufficient unit tests.
   
   ### How is backward compatibility guaranteed?
   By default, we do not change the original definition of output shapes in 
shape inference functions; we just change `ndim==0` to `ndim==-1` for unknown 
shape verification. No backward compatibility issues are expected for all but 
one case, `NDArray` indexing. To elaborate, the current behavior determines 
that `x[i]` always returns a tensor with `ndim >= 1`. We can keep the current 
behavior unchanged and implement a global switch for users to turn on for 
expecting NumPy-compatible results.
   
   Previous discussion of this topic can be seen 
[here](https://discuss.mxnet.io/t/rank-0-arrays-in-mxnet-aka-pi-is-wrong/108).
   
   ## Implementation of NumPy operators
   ### What to do?
   To address the problems of operator incompatibility with NumPy and alleviate 
the pain of diverged programming experience due to the operator namespace 
separation: `mxnet.ndarray` and `mxnet.symbol`, we propose creating a new 
namespace `mxnet.numpy`, adopting operator APIs from NumPy, and implementing 
those operator APIs under the namespace. `mxnet.numpy` should provide the same 
imperative programming experience as NumPy and will gradually replace all the 
non-neural-network operators in the current codebase. While implementing NumPy 
operators in MXNet, it is possible for us to leverage TVM to generate 
high-performance kernels 
([ref.](https://docs.tvm.ai/tutorials/get_started.html#sphx-glr-tutorials-get-started-py)).
   
   ### Can `mxnet.numpy` operators be used in Gluon for hybridization?
   The newly implemented NumPy operators can still be accessed through the 
module (`ndarray`/`symbol`) delegate `F` in Gluon, e.g. `F.numpy.dot`. This 
works because the new operators are still registered under `mxnet.ndarray` and 
`mxnet.symbol` behind the scene. It is just that users are encouraged to access 
NumPy operator APIs through `mxnet.numpy` to write pure imperative code and 
Gluon APIs for achieving hybrid coding experience. 
   
   ## Where to contribute code?
   A dev branch has been opened for this proposal.
   https://github.com/apache/incubator-mxnet/tree/numpy
   
   @junrushao1994 @szha @eric-haibin-lin @zheng-da @yzhliu 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to