yzhliu opened a new issue #15465: [RFC] Integrate TVM into Apache MXNet
URL: https://github.com/apache/incubator-mxnet/issues/15465
 
 
   # Problem Statement
   
   Currently in MXNet we implement operator kernels in C++. Developers need to 
specify the detail logic of each computation, which slows down the development 
process. Given the fact that we’re moving forward to be numpy-compatible[1], a 
large amount of operators are to be implemented. Moreover, we also have various 
of backend to support, including CPUs and GPUs from AMD, ARM, Intel, Nvidia, 
etc. It requires great effort to implement efficient kernels for each of these 
hardwares, so as writing test cases for each of the operator+backend 
combination.
   
   # Proposal
   
   Thus I would like to propose to integrate Apache TVM into Apache MXNet, so 
that we can leverage its ability to easily implement (in Python) 
high-performance operator kernels. Here’re some of the advantages,
   
   1. We devised a new approach to implement MXNet kernels in Python, (see PR 
https://github.com/apache/incubator-mxnet/pull/15345 ). For example, to 
implement broadcast add, people can write pure Python,
   
   ```python
   @defop(name="vadd", target="cpu", auto_broadcast=True,
          dtype=AllTypes, ndim=list(range(6)))
   def vadd(dtype, ndim):
       A = tvm.placeholder(shape=[tvm.var() for _ in range(ndim)], name='A', 
dtype=dtype)
       B = tvm.placeholder(shape=[tvm.var() for _ in range(ndim)], name='B', 
dtype=dtype)
       C = tvm.compute(shape=[tvm.var() for _ in range(ndim)],
                       lambda *index: A[index] + B[index], name='C')
       s = tvm.create_schedule(C.op)
       axes = [axis for axis in C.op.axis]
       fused = s[C].fuse(*axes)
       s[C].parallel(fused)
       return s, [A, B, C]
   ```
   
   The code above will be compiled to binary and linked into MXNet as a regular 
function. Note that the same piece of compute definition can be shared across 
the multiple backends (cpu, gpu, etc.). As it is much more concise than that 
defined in C++, it can lower the bar of implementing high-performance kernels 
and improve the development experience. We expect people can develop more 
efficiently with such approach.
   
   2. Operators in current TVM are already numpy-compatible, we can leverage 
the efforts there to help our Numpy project.
   
   # Approach
   
   * Build and link TVM dynamic library into MXNet.
   * Build infrastructure to write operator kernels in Python, including:
       * The approach for registering TVM operator into MXNet
       * Modify CI to enable TVM operator build, and pack the operator library 
together with the release binary. We can enable automatic performance tuning[2] 
later to further improve the performance.
   
   Currently Apache TVM has already been integrated as a 3rdparty repository, 
and some of the nnvm header files are included in MXNet source code.
   
   # FAQ
   
   Q: Does it increase the binary size of MXNet release?
   A: libtvm_runtime.so is roughly 750K, it is fairly small compared to 
libmxnet.so (60MB for cpu, ~500MB for gpu)
   
   Q: Are TVM operators going to replace current operators in MXNet?
   A: No. It is an alternative way to write kernels. For new operators which 
are easy to be written in this approach, we can benefit from the advantages 
mentioned above.
   
   Q: Any license problem?
   A: TVM is provided under Apache 2.0 License, and it’s currently incubating 
at Apache Software Foundation: 
https://tvm.ai/2019/03/18/tvm-apache-announcement 
   
   # Background of TVM
   
   Apache TVM is an open deep learning compiler stack for CPUs, GPUs, and 
specialized accelerators. It aims to close the gap between the 
productivity-focused deep learning frameworks, and the performance- or 
efficiency-oriented hardware backends. 
   
   * TVM has showed its ability to get decent performance not only on end2end 
neural network, but also on single kernels. Recently papers and benchmarks 
showed it can achieve better performance even comparing to vendors’ 
acceleration libraries: https://tvm.ai/2018/10/03/auto-opt-all 
   * TVM supports a large number of backend, including Intel, AMD, ARM CPUs and 
GPUs, as well as Nvidia GPUs and FPGAs. Reusing TVM's optimized kernels could 
benefit MXNet's backend support a lot.
   * TVM provides a flexible and convenient tool to route data structure and 
function call between C++ and other frontend language, which would be helpful 
for general purposes. See 
https://docs.tvm.ai/dev/runtime.html#tvm-node-and-compiler-stack for reference.
   
   # Reference
   
   [1] [RFC] Introducing NumPy-compatible coding experience into MXNet - 
https://github.com/apache/incubator-mxnet/issues/14253
   [2] [TVM] Automatic Kernel Optimization for Deep Learning on All Hardware 
Platforms - https://tvm.ai/2018/10/03/auto-opt-all
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to