yangulei commented on PR #11642:
URL: https://github.com/apache/tvm/pull/11642#issuecomment-1157200163

   Hi apeskov,
   
   I'm working on zero-copy in DNNL which may relate to the in-place primitives 
you mentioned:
   > 4. Added support of "quasi in-place" primitives. Currently that is stub 
and in-place behavior is simulated via src->dst copy. After update of 
"MemoryPlan" with in-place support this simulation in runtime will be switched 
off.
   
   I tried to enable zero-copy when the tensors are read or write by DNNL 
primitives by assigning the handle of the DNNL memory to TVM buffer before the 
execution of the primitives. It works for most of the CV models I have tested, 
while produce wrong results in cases when:
   - TVM `add` is converted to `post-op sum` in DNNL, and
   - one of the inputs has a non in-place layout transform ahead of `add`.
   
   With my understanding, the `post-op sum` is more like `accumulation` which 
requires the two inputs and the outputs to be the same buffer, and the non 
in-place OP before `add` breaks this requirement. I tried to replace `post-op 
sum` with `post-op binary add`, then the results are correct but DNNL may run 
into `ref:any` with terrible performance. 
   I couldn't find a solution to ensure both the correctness and optimal 
performance now. Do you have any idea about this? Is memory copy inevitable in 
this scenario?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to