[GitHub] [incubator-mxnet] reminisce opened a new issue #14253: [RFC] Introducing NumPy-compatible coding experience into MXNet

GitBox Wed, 28 Aug 2019 18:40:36 -0700

reminisce opened a new issue #14253: [RFC] Introducing NumPy-compatible coding
experience into MXNet
URL: https://github.com/apache/incubator-mxnet/issues/14253

## Motivation
Today deep learning scientists spend majority of their time on data
processing, debugging tensor algorithms, and tuning model parameters, instead
of architecting models from scratch by themselves as a result from the abundant
pre-trained models existing in many deep learning model zoos. This has
highlighted the usability of tensor APIs as a key factor for a framework to be
widely adopted.

MXNet was firstly designed with the focus on memory efficiency, computation
throughput and scalability. The usability problems begin to show up nowadays
when more and more models demonstrate dynamic natures, e.g. unknown-shape
tensors before runtime, control flow depending on a runtime result, etc. Here
we highlight the most frequent complaints about usability from users.
- Scalar tensors (aka zero-dim tensors) are not supported. For example,
given `a = [0, 1, 2]`, `a[1]` will generate an `NDArray` of shape `(1,)`,
instead of `()` as in NumPy.
- Zero-size tensor is not supported. For example, a tensor of shape `(0, 16,
256)` cannot be passed to an operator, because our system currently treats 0,
the first dimension size, as unknown, rather than a concrete number.
- Many operators' signatures and functionality are not NumPy compatible,
e.g. `nd.dot` vs. `np.dot`, `nd.concatenate` vs. `np.concatenate`, etc.
- Many NumPy operators are missing. See the [reference
link](https://github.com/apache/incubator-mxnet/issues?q=is%3Aissue+numpy+label%3ANumpy)
to GitHub issues.
- Operators whose outputs' shapes can only be determined at runtime are not
supported, e.g. `data[data < 0]` cannot run.
- Diverged programming experience due to the separation of imperative and
symbolic operators registered under `mxnet.ndarray` and `mxnet.symbol`.
- Control flow operators are hard to use. Users have to understand the
complicated signatures of control flow operators, instead of writing native
Python code using `for`, `while`, `if/else`, etc.
For example, we have learned (in a hard way) that it does not make a lot of
sense to ask users to write code like the following to perform a cumulative sum.
```python
def sum(state, i):
s = state + data[i]
return s, [s, i + 1]

def sum_cond(state, i):
return i < 4

out, state = F.contrib.while_loop(sum_cond, sum, [F.zeros((1)),
F.zeros((1))],
max_iterations=5)
```
Instead, users should be able to just write native Python code as the
following and if required, let the framework serialize it into a computation
graph for optimization and deployment.
```python
data = np.arange(5)
out = 0
i = 0
while i < 5:
out = out + data[i]
```

It is not hard to figure out that all of the above pain points can be
summarized as a result from lack of NumPy-compatible coding experience in
MXNet. While addressing the problems of better support of control flow
operators and a consolidated coding style for writing imperative and symbolic
code with more flexibility requires introducing fundamental changes into the
codebase for building new infrastructures, such as a new graph IR and executor,
which is extremely non-trivial and should be executed with a long-term plan, we
can, at the moment, improve the usability by fixing the issue of zero-dim/size
tensors and implementing NumPy operators in MXNet. Please allow us to discuss
how to achieve these short-term goals in the following.

## Support of zero-dim and zero-size tensors
### What's the problem?
Zero-dim and zero-size tensors are valid tensors in NumPy. The former, whose
shapes are `()`, represent scalars in `numpy.ndarray` format. The latter, which
have one or multiple zero dimension sizes in shapes, can be useful as a
placeholder for many `ndarray` operations, such as concatenating a zero-size
`ndarray` with another `ndarray`. MXNet does not support them due to the
reserved semantics of empty shape `()` and shapes with zero dimension sizes
indicating unknown shape information. Such information need to be filled out
during the shape inference stage in order to move forward to tensor
computations later.

### How to resolve the problem?
We can first change the current semantics to comply with NumPy definition.
1. Change the definition of unknown shapes from `ndim = 0` to `ndim = -1` in
`TShape` class.
2. Change the definition of unknown dimension sizes from `dim_size = 0` to
`dim_size = -1` in `TShape` class.

After this, we need to scan all over the codebase to modify the code
accordingly where `shape.ndim() == 0` and `shape.Size() == 0` is used to
perform unknown shape checks.

Please note that although MXNet's shape is a type inheriting from
`nnvm::Tuple`, which is often used to represent an list-like object, such as
`axis=(1, 2, 3)`, we will not change the meaning of an empty tuple. This
separation of definitions for empty shape and empty tuple keeps the their roles
clearly decoupled.

We propose to breakdown the efforts into the following steps.
1. Copy `tuple.h` from NNVM to MXNet and rename `nnvm::TShape` to
`mxnet::TShape`.
2. Replace all the places in MXNet where `nnvm::Tuple` and `nnvm::TShape`
are used with `mxnet::Tuple` and `mxnet::TShape`, respectively.
3. Change the definition of `TShape` in `tuple.h` to use `ndim = -1` to
indicate unknown shapes and `dim_size = -1` to indicate unknown shape dim sizes.
4. Modify all the existing shape inference and utility functions where `ndim
== 0` and `dim_size == 0` is used to accommodate the above changes.
5. Modify NNVM passes, `InferShape`, `PlanMemory`, and `Gradient`, where
`nnvm::TShape` is used, to accommodate the above changes.
6. Add sufficient unit tests.

### How is backward compatibility guaranteed?
By default, we do not change the original definition of output shapes in
shape inference functions; we just change `ndim==0` to `ndim==-1` for unknown
shape verification. No backward compatibility issues are expected for all but
one case, `NDArray` indexing. To elaborate, the current behavior determines
that `x[i]` always returns a tensor with `ndim >= 1`. We can keep the current
behavior unchanged and implement a global switch for users to turn on for
expecting NumPy-compatible results.

Previous discussion of this topic can be seen
[here](https://discuss.mxnet.io/t/rank-0-arrays-in-mxnet-aka-pi-is-wrong/108).

## Implementation of NumPy operators
### What to do?
To address the problems of operator incompatibility with NumPy and alleviate
the pain of diverged programming experience due to the operator namespace
separation: `mxnet.ndarray` and `mxnet.symbol`, we propose creating a new
namespace `mxnet.numpy`, adopting operator APIs from NumPy, and implementing
those operator APIs under the namespace. `mxnet.numpy` should provide the same
imperative programming experience as NumPy and will gradually replace all the
non-neural-network operators in the current codebase. While implementing NumPy
operators in MXNet, it is possible for us to leverage TVM to generate
high-performance kernels
([ref.](https://docs.tvm.ai/tutorials/get_started.html#sphx-glr-tutorials-get-started-py)).

### Can `mxnet.numpy` operators be used in Gluon for hybridization?
The newly implemented NumPy operators can still be accessed through the
module (`ndarray`/`symbol`) delegate `F` in Gluon, e.g. `F.numpy.dot`. This
works because the new operators are still registered under `mxnet.ndarray` and
`mxnet.symbol` behind the scene. It is just that users are encouraged to access
NumPy operator APIs through `mxnet.numpy` to write pure imperative code and
Gluon APIs for achieving hybrid coding experience.

## Where to contribute code?
A dev branch has been opened for this proposal.
https://github.com/apache/incubator-mxnet/tree/numpy

@junrushao1994 @szha @eric-haibin-lin @zheng-da @yzhliu


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] reminisce opened a new issue #14253: [RFC] Introducing NumPy-compatible coding experience into MXNet

Reply via email to