Hello all, We would like to propose a new mechanism that unifies the integration with most of the external acceleration libraries, including TVM, MKLDNN, TensorRT and more. The main idea is to integrate with the external libraries in the level of subgraphs instead of operators. There are a few reasons in favor of the new integration:
* Integration in the level of operators mixes the external library operators, such as MKLDNN, with MXNet operators and makes the implementation of the executor overcomplicated. We now have to deal with a lot of unexpected issues. (the executor needs to carefully deal with data format conversion between different operators; the operators of external libraries are subject to the same memory planning like other MXNet operaotrs, etc). * External libraries need to reconstruct the computation graph for better performance (e.g., operator fusion). Integration in the level of subgraphs allows external libraries to perform arbitrary graph transformation and computation. The proposal below provides both the design and the API for constructing subgraphs and executing subgraphs. https://cwiki.apache.org/confluence/display/MXNET/Unified+integration+with+external+acceleration+libraries Please let me know if you have any comments on this design and API. Thanks, Da
