Re: A proposal for unified integration with external acceleration libraries

Zheng, Da Mon, 04 Jun 2018 11:52:25 -0700

Hi Tao,

Thanks for your feedbacks.


For your questions:
1. This subgraph strategy is just a mechanism for integration with external 
libraries. We can use it if it provides benefits. It seems to me that CuDNN 
doesn't benefit much from this strategy. Although NHWC might be non-default, 
this layout just interprets dimensions of an array differently, which is very 
different from MKLDNN formats. The meaning of dimensions makes sense for only a 
few operators, so any operator that doesn't need to interpret dimensions can 
run on the arrays without any modification. It doesn't seem to me that it's 
necessary to isolate CuDNN operators from any other MXNet operators.

2. Imperative Gluon doesn't have subgraph. We can potentially consider an 
operator as a subgraph, so the strategy still works for Imperative Gluon. 
However, the question is why we want to make it work for imperative Gluon. 
Imperative Gluon is mainly used for debugging and doesn't care about 
performance much, while majority of the acceleration libraries I mentioned in 
the proposal is for accelerating inference and model serving. MKLDNN is 
probably the only exception. In the imperative gluon mode, we can have MKLDNN 
operators always output arrays with the default format.

3. You are absolutely right. The subgraph strategy can't avoid data conversion 
when conversion is needed. Currently, if the operators can understand both 
default and MKLDNN NDArrays, it works fine and we have spent a lot of time 
making this work well. However, the current MKLDNN backend can't handle well 
the interaction between the MKLDNN operators and the non-MKLDNN operators. This 
isn't just simply conversion between default NDArrays and MKLDNN NDArrays. To 
make this work, our choices are to 
* make all operators (the ones that use FComputeEx) to understand MKLDNN 
NDArray. This isn't scalable. There will be a lot of modifications on the 
operators. In the future, we might have more backends and we need to do the 
same for other backends.
* have the executor to recognize MKLDNN operators and perform data conversion. 
This makes the executor complex and needs to understand all backends.
* use the subgraph strategy to isolate MKLDNN operators. This is preferred for 
MKLDNN because the subgraph strategy is useful for many purposes (e.g., 
integration with acceleration libraries, dynamic shape inference, etc). We 
don't need to do much to make the subgraph strategy work well with MKLDNN as 
well and keep the executor simple and easy to maintain.
Another problem for the current implementation is that MKLDNN NDArrays are 
subject to the default memory planning of MXNet (this means an MKLDNN NDArray 
is reused in a computation graph). This problem caused a few bugs in the past 
and the fixes made the executor complex. The subgraph strategy can solve this 
problem in a cleaner way by using a different memory planning inside the MKLDNN 
subgraph (e.g., disable NDArray reuse inside the subgraph).

Best,
Da
 
On 6/3/18, 10:28 PM, "Lv, Tao A" <[email protected]> wrote:

    
    Hi Da and other developers,
    
    It's a great idea to limit external acceleration libs into certain scope 
and subgraph. I am not quite familiar with TVM and TensorRT's design. But from 
the side of MKL-DNN backend, here are my concerns on this proposal:
    
    1. Is subgraph for all third party acceleration libraries or just for those 
have different data layouts? I guess cudnn are also using non-default data 
layout (say NHWC) for int8. So does cudnn path also need follow this proposal? 
Since I notice that cudnn is not mentioned in the proposal.
    2. Would subgraph break the execution of imperative gluon interfaces? If we 
don't apply subgraph to imperative gluon, does that mean imperative gluon 
models cannot benefit from any acceleration libraries?
    3. Currently, most issues of mkldnn backend are from the interchange 
between mxnet default ndarray and mkldnn memory. Even after subgraph is applied 
to mkldnn backend, there will still have some fallback processes for those 
inputs which are not supported by mkldnn or those inputs which are view of 
other tensors. So we still need deal with the layout transformation between 
mkldnn specific layouts and mxnet default layout. We cannot avoid these with 
the current design of subgraph.
    
    For pushing mkldnn backend from 'experimental' to 'GA' in 1.3 release, we 
are working intensively to add more unit tests and improve the stability of it. 
Hopefully, these fixes and tests will upstream or be merged soon. Meanwhile, we 
are also trying to figure out how to improve the subgraph solution for properly 
addressing current issues and better extendibility in the future.
    
    Any comments and suggestions will be highly appreciated. Thanks.
    
    -tao
    
    -----Original Message-----
    From: Zheng, Da [mailto:[email protected]] 
    Sent: Saturday, June 2, 2018 4:38 AM
    To: [email protected]
    Subject: A proposal for unified integration with external acceleration 
libraries
    
    Hello all,
    
    We would like to propose a new mechanism that unifies the integration with 
most of the external acceleration libraries, including TVM, MKLDNN, TensorRT 
and more. The main idea is to integrate with the external libraries in the 
level of subgraphs instead of operators.
    There are a few reasons in favor of the new integration:
    
      *   Integration in the level of operators mixes the external library 
operators, such as MKLDNN, with MXNet operators and makes the implementation of 
the executor overcomplicated. We now have to deal with a lot of unexpected 
issues. (the executor needs to carefully deal with data format conversion 
between different operators; the operators of external libraries are subject to 
the same memory planning like other MXNet operaotrs, etc).
      *   External libraries need to reconstruct the computation graph for 
better performance (e.g., operator fusion). Integration in the level of 
subgraphs allows external libraries to perform arbitrary graph transformation 
and computation.
    
    The proposal below provides both the design and the API for constructing 
subgraphs and executing subgraphs.
    
https://cwiki.apache.org/confluence/display/MXNET/Unified+integration+with+external+acceleration+libraries
    
    Please let me know if you have any comments on this design and API.
    
    Thanks,
    Da

Re: A proposal for unified integration with external acceleration libraries

Reply via email to