apeskov opened a new pull request, #11345:
URL: https://github.com/apache/tvm/pull/11345
**Summary**
There are several limitation which prevent DNNL runtime from using in multi
instance mode. This patch tries to eliminate some of them.
Multi instance mode means: call Run() method concurrently from several
threads for single instance of DNNLJSONRuntime.
Particularly changed for multi instance support:
* Do not modify DNNLJSONRuntime fields from Run method. Make it like a
"const" method.
* Use explicit dnnl scratchpads where it requested.
* Make Intermediate tensors collection individual for each thread.
Other improvements:
* Zero copy handling of Input/Output tensors
* Use query API to ask DNNL about desired layouts. Prevent from using of
unoptimised kernels.
* Automatic injection or reorder primitives if layout doesn't match.
* Support of different data types (int8, uint8, int32). Eliminate all
"fp32" hardcoded values.
**Details**
Introduced indirect handling of memory objects. New objects are
TensorRequisite and TensorRegistry.
TensorRequisite - describe sequence of transformation on top of some source
tensor.
TensorRegistry - implement matching of TensorRequisite described tensor and
real dnnl::memory objects.
This concept allows to:
* Decouple primitives arguments and real memory object. Matching of
arguments to memory objects happens on demand depending on contexts of
execution (thread id, arguments of method Run).
* Don't care about nature of tensor. Constant weights, intermediate tensors
and external input/output processed identically.
Some pseudo code to demonstrate concept:
```
DLTensor src_mem = {5, 2, 128, 128, 8} // 5D tensor
// Describe sequence of layout transformation
auto tr = TensorRequisite.AsIs(src_mem, eid); // 5D
tr = tr.treatAs("ABCD8b"); // 4D
tr = tr.permute({0, 2, 3, 1}); // permute axes NCHW -> NHWC
tr = tr.crop({1, 128, 128, 16}, {0, 0, 0}); // extract first batch
tr = tr.squeeze();
// register TR
TensorRegistry t_reg;
auto t_id = t_reg.register(tr);
t_reg.finalize();
// Obtain dnnl::memory object inside of method Run()
auto mem = t_id.solve(t_id, ext_io_provider);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]