apeskov opened a new pull request, #11345:
URL: https://github.com/apache/tvm/pull/11345

   **Summary**
   There are several limitation which prevent DNNL runtime from using in multi 
instance mode. This patch tries to eliminate some of them.
   
   Multi instance mode means: call Run() method concurrently from several 
threads for single instance of DNNLJSONRuntime. 
   
   Particularly changed for multi instance support:
    * Do not modify DNNLJSONRuntime fields from Run method. Make it like a 
"const" method.
    * Use explicit dnnl scratchpads where it requested.
    * Make Intermediate tensors collection individual for each thread.
   
   Other improvements:
    * Zero copy handling of Input/Output tensors
    * Use query API to ask DNNL about desired layouts. Prevent from using of 
unoptimised kernels.
    * Automatic injection or reorder primitives if layout doesn't match.
    * Support of different data types (int8, uint8, int32). Eliminate all 
"fp32" hardcoded values.
   
   **Details**
   Introduced indirect handling of memory objects. New objects are 
TensorRequisite and TensorRegistry.   
   TensorRequisite - describe sequence of transformation on top of some source 
tensor.
   TensorRegistry - implement matching of TensorRequisite described tensor and 
real dnnl::memory objects. 
   
   This concept allows to:
   * Decouple primitives arguments and real memory object. Matching of 
arguments to memory objects happens on demand depending on contexts of 
execution (thread id, arguments of method Run).
   * Don't care about nature of tensor. Constant weights, intermediate tensors 
and external input/output processed identically.     
   
   Some pseudo code to demonstrate concept:
   ```
   DLTensor src_mem = {5, 2, 128, 128, 8} // 5D tensor
   
   // Describe sequence of layout transformation
   auto tr = TensorRequisite.AsIs(src_mem, eid);  // 5D
   tr = tr.treatAs("ABCD8b");  // 4D
   tr = tr.permute({0, 2, 3, 1});  // permute axes NCHW -> NHWC
   tr = tr.crop({1, 128, 128, 16}, {0, 0, 0});  // extract first batch
   tr = tr.squeeze();
   
   // register TR
   TensorRegistry t_reg;
   auto t_id = t_reg.register(tr);
   t_reg.finalize();
   
   // Obtain dnnl::memory object inside of method Run()
   auto mem = t_id.solve(t_id, ext_io_provider);
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to