[GitHub] eric-haibin-lin opened a new issue #13598: More fine-grained operator implementation dispatch & memory planning flow

GitBox Sun, 09 Dec 2018 21:08:19 -0800

eric-haibin-lin opened a new issue #13598: More fine-grained operator 
implementation dispatch & memory planning flow 
URL: https://github.com/apache/incubator-mxnet/issues/13598
 
 
   ## Existing Execution Flow
   ```
   g = graph()
   shapes = g.infer_shape()
   types = g.infer_type()
   storage_types, dispatch_modes = g.infer_storage_type()
   memory_plan = nnvm::plan_memory() // which calls 
node.finplace_option(node.attr)
   for node in g:
     fcompute = get_fcompute(node)
     fcompute(x)
   ```
   ### The drawback of the existing flow
   - the selection of MKL/CPU/GPU/CUDNN implementation happens after graph 
attribute inference and memory planning, **memory planning is thus not aware of 
the implementation that will be used for execution in the future, which may 
result in sub-optimal result**. For example, the memory inplace option may vary 
depending on the accelerator backend (the new version of CUDNN enables x/dx 
inplace for _backward_conv).
   - some sparse operator need to access dtype/shape information to decide 
which implementation to invoke for execution, and whether to perform fallback. 
This information is not yet exposed in the existing infer storage type 
interface. 
   
   ## Alternative Flow
   op implementations
   ```
   CovolutionComputeCUDNN(const nnvm::NodeAttrs& attrs,
                          const OpContext& ctx,
                          const std::vector<TBlob>& inputs,
                          const std::vector<OpReqType>& req,
                          const std::vector<TBlob>& outputs) {
     // 1st CUDNN implementation goes here 
   }
   
   CovolutionComputeMKL(const nnvm::NodeAttrs& attrs,
                          const OpContext& ctx,
                          const std::vector<NDArray>& inputs,
                          const std::vector<OpReqType>& req,
                          const std::vector<NDArray>& outputs) {
     // MKL implementation goes here 
   }
   
   CovolutionComputeImplGPU(const nnvm::NodeAttrs& attrs,
                          const OpContext& ctx,
                          const std::vector<TBlob>& inputs,
                          const std::vector<OpReqType>& req,
                          const std::vector<TBlob>& outputs) {
     // GPU implementation goes here 
   }
   
   CovolutionComputeImplCPU(const nnvm::NodeAttrs& attrs,
                          const OpContext& ctx,
                          const std::vector<TBlob>& inputs,
                          const std::vector<OpReqType>& req,
                          const std::vector<TBlob>& outputs) {
     // CPU implementation goes here 
   }
   ```
   new finferstorage_ex interface
   ```
   void  FInferStorageTypeEx(const std::vector<TShape>& in_shapes,
                             const std::vector<int>& in_types,
                             const std::vector<int>& in_stype,
                             const std::vector<TShape>& out_shape,
                             const std::vector<int>& out_type,
                             std::vector<int>* out_stype,
                             int dev_mask,
                             NodeAttrs* attrs, // mutable
                             DispatchMode* dispatch_mode // mutable) {
       // GPU
       if (dev_mask == kGPU) {
         out_stype[0] = kDefaultStorage;
         dispatch_mode = kFCompute;
   #if MXNET_USE_CUDNN
         if (attrs.params.kernel.ndim() == 2 && dtype == float && 
in_shape[0].ndim() == 1 && …) {
           attrs.exec_func = CovolutionComputeImplCUDNN;
         } else {
           attrs.exec_func = CovolutionComputeImplGPU;
         }
   #else 
       attrs.exec_func = CovolutionComputeImplGPU;
   #endif
       // CPU
       } else {
   #if MXNET_USE_MKLDNN
       attrs.exec_func = CovolutionComputeMKL
       out_stype[0] = kDefaultStorage;
       dispatch_mode = kFComputeEx;
   #else
       attrs.exec_func = CovolutionComputeCPU
       ...
   #endif
   }
   ```
   FInplaceOption for convolution
   ```
   [] FInplaceOption(const NodeAttrs& attrs) {
     if (attrs.exec_func == CovolutionComputeCUDNN) {
       return {0,0};
     } else {
       return {}
     }
   }
   ```
   New Execution Flow:
   ```
   g = graph()
   shapes = g.infer_shape()
   types = g.infer_type()
   if (g.has_attr('FInferStorageTypeEx')) {
     storage_types, dispatch_modes = g.infer_storage_type_ex()
   } else {
     storage_types, dispatch_modes = g.infer_storage_type()
   }
   memory_plan = nnvm::plan_memory() // which calls 
node.finplace_option(node.attr)
   for node in g:
     if (node.attrs.exec_func) {
       fcompute = node.attrs.exec_func
     } else {
       fcompute = get_fcompute(node)
     }
     fcompute(x)
   ```
   
   @DickJC123 @ptrendx @piiswrong @reminisce


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] eric-haibin-lin opened a new issue #13598: More fine-grained operator implementation dispatch & memory planning flow

Reply via email to