yangulei commented on PR #11642: URL: https://github.com/apache/tvm/pull/11642#issuecomment-1157200163
Hi apeskov, I'm working on zero-copy in DNNL which may relate to the in-place primitives you mentioned: > 4. Added support of "quasi in-place" primitives. Currently that is stub and in-place behavior is simulated via src->dst copy. After update of "MemoryPlan" with in-place support this simulation in runtime will be switched off. I tried to enable zero-copy when the tensors are read or write by DNNL primitives by assigning the handle of the DNNL memory to TVM buffer before the execution of the primitives. It works for most of the CV models I have tested, while produce wrong results in cases when: - TVM `add` is converted to `post-op sum` in DNNL, and - one of the inputs has a non in-place layout transform ahead of `add`. With my understanding, the `post-op sum` is more like `accumulation` which requires the two inputs and the outputs to be the same buffer, and the non in-place OP before `add` breaks this requirement. I tried to replace `post-op sum` with `post-op binary add`, then the results are correct but DNNL may run into `ref:any` with terrible performance. I couldn't find a solution to ensure both the correctness and optimal performance now. Do you have any idea about this? Is memory copy inevitable in this scenario? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
