yzhliu opened a new issue #15465: [RFC] Integrate TVM into Apache MXNet URL: https://github.com/apache/incubator-mxnet/issues/15465 # Problem Statement Currently in MXNet we implement operator kernels in C++. Developers need to specify the detail logic of each computation, which slows down the development process. Given the fact that we’re moving forward to be numpy-compatible[1], a large amount of operators are to be implemented. Moreover, we also have various of backend to support, including CPUs and GPUs from AMD, ARM, Intel, Nvidia, etc. It requires great effort to implement efficient kernels for each of these hardwares, so as writing test cases for each of the operator+backend combination. # Proposal Thus I would like to propose to integrate Apache TVM into Apache MXNet, so that we can leverage its ability to easily implement (in Python) high-performance operator kernels. Here’re some of the advantages, 1. We devised a new approach to implement MXNet kernels in Python, (see PR https://github.com/apache/incubator-mxnet/pull/15345 ). For example, to implement broadcast add, people can write pure Python, ```python @defop(name="vadd", target="cpu", auto_broadcast=True, dtype=AllTypes, ndim=list(range(6))) def vadd(dtype, ndim): A = tvm.placeholder(shape=[tvm.var() for _ in range(ndim)], name='A', dtype=dtype) B = tvm.placeholder(shape=[tvm.var() for _ in range(ndim)], name='B', dtype=dtype) C = tvm.compute(shape=[tvm.var() for _ in range(ndim)], lambda *index: A[index] + B[index], name='C') s = tvm.create_schedule(C.op) axes = [axis for axis in C.op.axis] fused = s[C].fuse(*axes) s[C].parallel(fused) return s, [A, B, C] ``` The code above will be compiled to binary and linked into MXNet as a regular function. Note that the same piece of compute definition can be shared across the multiple backends (cpu, gpu, etc.). As it is much more concise than that defined in C++, it can lower the bar of implementing high-performance kernels and improve the development experience. We expect people can develop more efficiently with such approach. 2. Operators in current TVM are already numpy-compatible, we can leverage the efforts there to help our Numpy project. # Approach * Build and link TVM dynamic library into MXNet. * Build infrastructure to write operator kernels in Python, including: * The approach for registering TVM operator into MXNet * Modify CI to enable TVM operator build, and pack the operator library together with the release binary. We can enable automatic performance tuning[2] later to further improve the performance. Currently Apache TVM has already been integrated as a 3rdparty repository, and some of the nnvm header files are included in MXNet source code. # FAQ Q: Does it increase the binary size of MXNet release? A: libtvm_runtime.so is roughly 750K, it is fairly small compared to libmxnet.so (60MB for cpu, ~500MB for gpu) Q: Are TVM operators going to replace current operators in MXNet? A: No. It is an alternative way to write kernels. For new operators which are easy to be written in this approach, we can benefit from the advantages mentioned above. Q: Any license problem? A: TVM is provided under Apache 2.0 License, and it’s currently incubating at Apache Software Foundation: https://tvm.ai/2019/03/18/tvm-apache-announcement # Background of TVM Apache TVM is an open deep learning compiler stack for CPUs, GPUs, and specialized accelerators. It aims to close the gap between the productivity-focused deep learning frameworks, and the performance- or efficiency-oriented hardware backends. * TVM has showed its ability to get decent performance not only on end2end neural network, but also on single kernels. Recently papers and benchmarks showed it can achieve better performance even comparing to vendors’ acceleration libraries: https://tvm.ai/2018/10/03/auto-opt-all * TVM supports a large number of backend, including Intel, AMD, ARM CPUs and GPUs, as well as Nvidia GPUs and FPGAs. Reusing TVM's optimized kernels could benefit MXNet's backend support a lot. * TVM provides a flexible and convenient tool to route data structure and function call between C++ and other frontend language, which would be helpful for general purposes. See https://docs.tvm.ai/dev/runtime.html#tvm-node-and-compiler-stack for reference. # Reference [1] [RFC] Introducing NumPy-compatible coding experience into MXNet - https://github.com/apache/incubator-mxnet/issues/14253 [2] [TVM] Automatic Kernel Optimization for Deep Learning on All Hardware Platforms - https://tvm.ai/2018/10/03/auto-opt-all
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
