Hi Da and other developers,

It's a great idea to limit external acceleration libs into certain scope and 
subgraph. I am not quite familiar with TVM and TensorRT's design. But from the 
side of MKL-DNN backend, here are my concerns on this proposal:

1. Is subgraph for all third party acceleration libraries or just for those 
have different data layouts? I guess cudnn are also using non-default data 
layout (say NHWC) for int8. So does cudnn path also need follow this proposal? 
Since I notice that cudnn is not mentioned in the proposal.
2. Would subgraph break the execution of imperative gluon interfaces? If we 
don't apply subgraph to imperative gluon, does that mean imperative gluon 
models cannot benefit from any acceleration libraries?
3. Currently, most issues of mkldnn backend are from the interchange between 
mxnet default ndarray and mkldnn memory. Even after subgraph is applied to 
mkldnn backend, there will still have some fallback processes for those inputs 
which are not supported by mkldnn or those inputs which are view of other 
tensors. So we still need deal with the layout transformation between mkldnn 
specific layouts and mxnet default layout. We cannot avoid these with the 
current design of subgraph.

For pushing mkldnn backend from 'experimental' to 'GA' in 1.3 release, we are 
working intensively to add more unit tests and improve the stability of it. 
Hopefully, these fixes and tests will upstream or be merged soon. Meanwhile, we 
are also trying to figure out how to improve the subgraph solution for properly 
addressing current issues and better extendibility in the future.

Any comments and suggestions will be highly appreciated. Thanks.

-tao

-----Original Message-----
From: Zheng, Da [mailto:[email protected]] 
Sent: Saturday, June 2, 2018 4:38 AM
To: [email protected]
Subject: A proposal for unified integration with external acceleration libraries

Hello all,

We would like to propose a new mechanism that unifies the integration with most 
of the external acceleration libraries, including TVM, MKLDNN, TensorRT and 
more. The main idea is to integrate with the external libraries in the level of 
subgraphs instead of operators.
There are a few reasons in favor of the new integration:

  *   Integration in the level of operators mixes the external library 
operators, such as MKLDNN, with MXNet operators and makes the implementation of 
the executor overcomplicated. We now have to deal with a lot of unexpected 
issues. (the executor needs to carefully deal with data format conversion 
between different operators; the operators of external libraries are subject to 
the same memory planning like other MXNet operaotrs, etc).
  *   External libraries need to reconstruct the computation graph for better 
performance (e.g., operator fusion). Integration in the level of subgraphs 
allows external libraries to perform arbitrary graph transformation and 
computation.

The proposal below provides both the design and the API for constructing 
subgraphs and executing subgraphs.
https://cwiki.apache.org/confluence/display/MXNET/Unified+integration+with+external+acceleration+libraries

Please let me know if you have any comments on this design and API.

Thanks,
Da

Reply via email to